ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization
- URL: http://arxiv.org/abs/2603.02897v1
- Date: Tue, 03 Mar 2026 11:47:05 GMT
- Title: ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization
- Authors: Hao Cao, Chengbin Liang, Wenqi Guo, Zhijin Qin, Jungong Han,
- Abstract summary: We propose Progressive Generative Image Compression (ProGIC), a compact built on residual vector quantization (RVQ)<n>In RVQ, a sequence of vector quantizers encodes the residuals stage by stage, each with its own codebook.<n>We pair this with a lightweight backbone based on depthwise-separable convolutions and small attention blocks, enabling practical deployment on both GPU and CPU-only devices.
- Score: 59.481950697968706
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advances in generative image compression (GIC) have delivered remarkable improvements in perceptual quality. However, many GICs rely on large-scale and rigid models, which severely constrain their utility for flexible transmission and practical deployment in low-bitrate scenarios. To address these issues, we propose Progressive Generative Image Compression (ProGIC), a compact codec built on residual vector quantization (RVQ). In RVQ, a sequence of vector quantizers encodes the residuals stage by stage, each with its own codebook. The resulting codewords sum to a coarse-to-fine reconstruction and a progressive bitstream, enabling previews from partial data. We pair this with a lightweight backbone based on depthwise-separable convolutions and small attention blocks, enabling practical deployment on both GPUs and CPU-only devices. Experimental results show that ProGIC attains comparable compression performance compared with previous methods. It achieves bitrate savings of up to 57.57% on DISTS and 58.83% on LPIPS compared to MS-ILLM on the Kodak dataset. Beyond perceptual quality, ProGIC enables progressive transmission for flexibility, and also delivers over 10 times faster encoding and decoding compared with MS-ILLM on GPUs for efficiency.
Related papers
- GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization [0.0]
GANCompress is a novel neural compression framework that combines Binary Spherical Quantization (BSQ) with Generative Adversarial Networks (GANs)<n>We show that GANCompress achieves substantial improvement in compression efficiency -- reducing file sizes by up to 100x with minimal visual distortion.
arXiv Detail & Related papers (2025-05-19T00:18:27Z) - Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.<n>We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.<n>Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z) - CALLIC: Content Adaptive Learning for Lossless Image Compression [64.47244912937204]
CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression.<n>We propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations.<n>During encoding, we decompose pre-trained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT)<n>RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time.
arXiv Detail & Related papers (2024-12-23T10:41:18Z) - High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation [52.82508784748278]
This paper proposes a Control Generative Image Compression framework, termed Control-GIC.<n>Control-GIC is capable of fine-grained adaption across a broad spectrum while ensuring high-fidelity and generality compression.<n>Our experiments show that Control-GIC allows highly flexible and controllable adaption where the results demonstrate its superior performance over recent state-of-the-art methods.
arXiv Detail & Related papers (2024-06-02T14:22:09Z) - Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression.
A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization.
Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain.
The codebook learned by the VQGAN model yields a strong expressive capacity.
The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z) - Computationally-Efficient Neural Image Compression with Shallow Decoders [43.115831685920114]
This paper takes a step forward towards closing the gap in decoding complexity by using a shallow or even linear decoding transform resembling that of JPEG.
We exploit the often asymmetrical budget between encoding and decoding, by adopting more powerful encoder networks and iterative encoding.
arXiv Detail & Related papers (2023-04-13T03:38:56Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.