Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer
- URL: http://arxiv.org/abs/2403.03736v1
- Date: Wed, 6 Mar 2024 14:27:02 GMT
- Title: Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer
- Authors: Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma
- Abstract summary: This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression.
A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization.
Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
- Score: 35.500720262253054
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent progress in generative compression technology has significantly
improved the perceptual quality of compressed data. However, these advancements
primarily focus on producing high-frequency details, often overlooking the
ability of generative models to capture the prior distribution of image
content, thus impeding further bitrate reduction in extreme compression
scenarios (<0.05 bpp). Motivated by the capabilities of predictive language
models for lossless compression, this paper introduces a novel Unified Image
Generation-Compression (UIGC) paradigm, merging the processes of generation and
compression. A key feature of the UIGC framework is the adoption of
vector-quantized (VQ) image models for tokenization, alongside a multi-stage
transformer designed to exploit spatial contextual information for modeling the
prior distribution. As such, the dual-purpose framework effectively utilizes
the learned prior for entropy estimation and assists in the regeneration of
lost tokens. Extensive experiments demonstrate the superiority of the proposed
UIGC framework over existing codecs in perceptual quality and human perception,
particularly in ultra-low bitrate scenarios (<=0.03 bpp), pioneering a new
direction in generative compression.
Related papers
- CALLIC: Content Adaptive Learning for Lossless Image Compression [64.47244912937204]
CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression.
We propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations.
During encoding, we decompose pre-trained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT)
RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time.
arXiv Detail & Related papers (2024-12-23T10:41:18Z) - Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption [52.82508784748278]
This paper proposes a Control Generative Image Compression framework, termed Control-GIC.
Control-GIC is capable of fine-grained adaption across a broad spectrum while ensuring high-fidelity and generality compression.
We develop a conditional decoder capable of retrieving historic multi-granularity representations according to encoded codes, and then reconstruct hierarchical features in the formalization of conditional probability.
arXiv Detail & Related papers (2024-06-02T14:22:09Z) - Transferable Learned Image Compression-Resistant Adversarial Perturbations [66.46470251521947]
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks.
We introduce a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules.
arXiv Detail & Related papers (2024-01-06T03:03:28Z) - Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain.
The codebook learned by the VQGAN model yields a strong expressive capacity.
The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z) - ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image
Compression [18.05997169440533]
We propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive auto-regressive.
We show that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM.
arXiv Detail & Related papers (2023-07-12T11:45:54Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - Generalized Octave Convolutions for Learned Multi-Frequency Image
Compression [20.504561050200365]
We propose the first learned multi-frequency image compression and entropy coding approach.
It is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components.
We show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks.
arXiv Detail & Related papers (2020-02-24T01:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.