Neural Image Compression Using Masked Sparse Visual Representation
- URL: http://arxiv.org/abs/2309.11661v1
- Date: Wed, 20 Sep 2023 21:59:23 GMT
- Title: Neural Image Compression Using Masked Sparse Visual Representation
- Authors: Wei Jiang and Wei Wang and Yue Chen
- Abstract summary: We study neural image compression based on the Sparse Visual Representation (SVR), where images are embedded into a discrete latent space spanned by learned visual codebooks.
By sharing codebooks with the decoder, the encoder transfers codeword indices that are efficient and cross-platform robust.
We propose a Masked Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent feature subspace to balance and reconstruction quality.
- Score: 17.229601298529825
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study neural image compression based on the Sparse Visual Representation
(SVR), where images are embedded into a discrete latent space spanned by
learned visual codebooks. By sharing codebooks with the decoder, the encoder
transfers integer codeword indices that are efficient and cross-platform
robust, and the decoder retrieves the embedded latent feature using the indices
for reconstruction. Previous SVR-based compression lacks effective mechanism
for rate-distortion tradeoffs, where one can only pursue either high
reconstruction quality or low transmission bitrate. We propose a Masked
Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent
feature subspace to balance bitrate and reconstruction quality. A set of
semantic-class-dependent basis codebooks are learned, which are weighted
combined to generate a rich latent feature for high-quality reconstruction. The
combining weights are adaptively derived from each input image, providing
fidelity information with additional transmission costs. By masking out
unimportant weights in the encoder and recovering them in the decoder, we can
trade off reconstruction quality for transmission bits, and the masking rate
controls the balance between bitrate and distortion. Experiments over the
standard JPEG-AI dataset demonstrate the effectiveness of our M-AdaCode
approach.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network [10.427300958330816]
Decoding remote sensing images to achieve high perceptual quality, particularly at lows, remains a significant challenge.
We propose the invertible neural network-based remote sensing image compression (INN-RSIC) method.
Our INN-RSIC significantly outperforms the existing state-of-the-art traditional and deep learning-based image compression methods in terms of perception quality.
arXiv Detail & Related papers (2024-05-17T03:52:37Z) - HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression [51.04820313355164]
HyrbidFlow combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme lows.
Experimental results demonstrate superior performance across several datasets under extremely lows.
arXiv Detail & Related papers (2024-04-20T13:19:08Z) - Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression [58.618625678054826]
This study presents an enhanced neural compression method designed for optimal visual fidelity.
We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss.
Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z) - An Efficient Implicit Neural Representation Image Codec Based on Mixed Autoregressive Model for Low-Complexity Decoding [43.43996899487615]
Implicit Neural Representation (INR) for image compression is an emerging technology that offers two key benefits compared to cutting-edge autoencoder models.
We introduce a new Mixed AutoRegressive Model (MARM) to significantly reduce the decoding time for the current INR.
MARM includes our proposed AutoRegressive Upsampler (ARU) blocks, which are highly efficient, and ARM from previous work to balance decoding time and reconstruction quality.
arXiv Detail & Related papers (2024-01-23T09:37:58Z) - Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain.
The codebook learned by the VQGAN model yields a strong expressive capacity.
The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.