HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression
- URL: http://arxiv.org/abs/2404.13372v1
- Date: Sat, 20 Apr 2024 13:19:08 GMT
- Title: HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression
- Authors: Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang,
- Abstract summary: HyrbidFlow combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme lows.
Experimental results demonstrate superior performance across several datasets under extremely lows.
- Score: 51.04820313355164
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption [57.056311855630916]
We propose a Controllable Generative Image Compression framework, Control-GIC.
It is capable of fine-grained adaption across a broad spectrum while ensuring high-fidelity and generality compression.
We develop a conditional conditionalization that can trace back to historic encoded multi-granularity representations.
arXiv Detail & Related papers (2024-06-02T14:22:09Z) - Neural Image Compression Using Masked Sparse Visual Representation [17.229601298529825]
We study neural image compression based on the Sparse Visual Representation (SVR), where images are embedded into a discrete latent space spanned by learned visual codebooks.
By sharing codebooks with the decoder, the encoder transfers codeword indices that are efficient and cross-platform robust.
We propose a Masked Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent feature subspace to balance and reconstruction quality.
arXiv Detail & Related papers (2023-09-20T21:59:23Z) - Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain.
The codebook learned by the VQGAN model yields a strong expressive capacity.
The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Lossy Compression with Gaussian Diffusion [28.930398810600504]
We describe a novel lossy compression approach called DiffC which is based on unconditional diffusion generative models.
We implement a proof of concept and find that it works surprisingly well despite the lack of an encoder transform.
We show that a flow-based reconstruction achieves a 3 dB gain over ancestral sampling at highs.
arXiv Detail & Related papers (2022-06-17T16:46:31Z) - Progressive Neural Image Compression with Nested Quantization and Latent
Ordering [16.871212593949487]
We present PLONQ, a progressive neural image compression scheme which pushes the boundary of variable compression by allowing scalable coding with a single bitstream.
To the best of our knowledge, PLONQ is first learning-based progressive image coding scheme and it outperforms SPIHT, a well-known wavelet-based progressive image.
arXiv Detail & Related papers (2021-02-04T22:06:13Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.