Self-Organized Variational Autoencoders (Self-VAE) for Learned Image
Compression
- URL: http://arxiv.org/abs/2105.12107v2
- Date: Wed, 26 May 2021 21:34:05 GMT
- Title: Self-Organized Variational Autoencoders (Self-VAE) for Learned Image
Compression
- Authors: M. Ak{\i}n Y{\i}lmaz, Onur Kele\c{s}, Hilal G\"uven, A. Murat Tekalp,
Junaid Malik, Serkan K{\i}ranyaz
- Abstract summary: We propose a novel self-organized variational autoencoder architecture that benefits from stronger non-linearity.
The experimental results demonstrate that the proposed Self-VAE yields improvements in both rate-distortion performance and perceptual image quality.
- Score: 12.539504557044653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In end-to-end optimized learned image compression, it is standard practice to
use a convolutional variational autoencoder with generalized divisive
normalization (GDN) to transform images into a latent space. Recently,
Operational Neural Networks (ONNs) that learn the best non-linearity from a set
of alternatives, and their self-organized variants, Self-ONNs, that approximate
any non-linearity via Taylor series have been proposed to address the
limitations of convolutional layers and a fixed nonlinear activation. In this
paper, we propose to replace the convolutional and GDN layers in the
variational autoencoder with self-organized operational layers, and propose a
novel self-organized variational autoencoder (Self-VAE) architecture that
benefits from stronger non-linearity. The experimental results demonstrate that
the proposed Self-VAE yields improvements in both rate-distortion performance
and perceptual image quality.
Related papers
- Uncertainty-Guided Selective Adaptation Enables Cross-Platform Predictive Fluorescence Microscopy [65.15943255667733]
We introduce Subnetwork Image Translation ADDA with automatic depth selection (SIT-ADDA-Auto)<n>We show that adapting only the earliest convolutional layers, while freezing deeper layers, yields reliable transfer.<n>Our results provide a design rule for label-free adaptation in microscopy and a recipe for field settings; the code is publicly available.
arXiv Detail & Related papers (2025-11-15T03:01:05Z) - Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z) - Discrete Variational Autoencoding via Policy Search [16.257957838291563]
discrete latent bottlenecks in variational autoencoders (VAEs) offer high bit efficiency.<n> discrete random variables do not allow for exact differentiable parameterization.<n>We propose a training framework for discrete VAEs that leverages the natural gradient of a non-parametric encoder.<n>Our method, combined with automatic step size adaptation and a transformer-based encoder, scales to challenging datasets such as ImageNet.
arXiv Detail & Related papers (2025-09-29T12:44:05Z) - Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection [54.433899174017185]
Out-of-distribution (OOD) detection is crucial for building reliable machine learning models.<n>We propose a novel method called Knowledge Regularized Negative Feature Tuning (KR-NFT)<n>NFT applies distribution-aware transformations to pre-trained text features, effectively separating positive and negative features into distinct spaces.<n>When trained with few-shot samples from ImageNet dataset, KR-NFT not only improves ID classification accuracy and OOD detection but also significantly reduces the FPR95 by 5.44%.
arXiv Detail & Related papers (2025-07-26T07:44:04Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - Differentially Private Learning with Per-Sample Adaptive Clipping [8.401653565794353]
We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
arXiv Detail & Related papers (2022-12-01T07:26:49Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Self-Supervised Variational Auto-Encoders [10.482805367361818]
We present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE)
This class of models allows to perform both conditional and unconditional sampling, while simplifying the objective function.
We present performance of our approach on three benchmark image data (Cifar10, Imagenette64, and CelebA)
arXiv Detail & Related papers (2020-10-05T13:42:28Z) - Operational vs Convolutional Neural Networks for Image Denoising [25.838282412957675]
Convolutional Neural Networks (CNNs) have recently become a favored technique for image denoising due to its adaptive learning ability.
We propose a heterogeneous network model which allows greater flexibility for embedding additional non-linearity at the core of the data transformation.
An extensive set of comparative evaluations of ONNs and CNNs over two severe image denoising problems yield conclusive evidence that ONNs enriched by non-linear operators can achieve a superior denoising performance against CNNs with both equivalent and well-known deep configurations.
arXiv Detail & Related papers (2020-09-01T12:15:28Z) - A Flexible Framework for Designing Trainable Priors with Adaptive
Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems.
We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions.
This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.