UniMIC: Towards Universal Multi-modality Perceptual Image Compression
- URL: http://arxiv.org/abs/2412.04912v2
- Date: Mon, 09 Dec 2024 09:50:00 GMT
- Title: UniMIC: Towards Universal Multi-modality Perceptual Image Compression
- Authors: Yixin Gao, Xin Li, Xiaohan Pan, Runsen Feng, Zongyu Guo, Yiting Lu, Yulin Ren, Zhibo Chen,
- Abstract summary: We present UniMIC, a universal multi-modality image compression framework.<n>UniMIC aims to unify the rate-distortion-perception (RDP) optimization for multiple image codecs.
- Score: 21.370591256689885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present UniMIC, a universal multi-modality image compression framework, intending to unify the rate-distortion-perception (RDP) optimization for multiple image codecs simultaneously through excavating cross-modality generative priors. Unlike most existing works that need to design and optimize image codecs from scratch, our UniMIC introduces the visual codec repository, which incorporates amounts of representative image codecs and directly uses them as the basic codecs for various practical applications. Moreover, we propose multi-grained textual coding, where variable-length content prompt and compression prompt are designed and encoded to assist the perceptual reconstruction through the multi-modality conditional generation. In particular, a universal perception compensator is proposed to improve the perception quality of decoded images from all basic codecs at the decoder side by reusing text-assisted diffusion priors from stable diffusion. With the cooperation of the above three strategies, our UniMIC achieves a significant improvement of RDP optimization for different compression codecs, e.g., traditional and learnable codecs, and different compression costs, e.g., ultra-low bitrates. The code will be available in https://github.com/Amygyx/UniMIC .
Related papers
- ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization [59.481950697968706]
We propose Progressive Generative Image Compression (ProGIC), a compact built on residual vector quantization (RVQ)<n>In RVQ, a sequence of vector quantizers encodes the residuals stage by stage, each with its own codebook.<n>We pair this with a lightweight backbone based on depthwise-separable convolutions and small attention blocks, enabling practical deployment on both GPU and CPU-only devices.
arXiv Detail & Related papers (2026-03-03T11:47:05Z) - Benchmarking and Enhancing VLM for Compressed Image Understanding [52.98037879935058]
Vision-Language Models (VLMs) predominantly digest and understand high-bitrate compressed images.<n>Their ability to interpret low-bitrate compressed images has yet to be explored by far.<n>We introduce the first comprehensive benchmark to evaluate the ability of VLM against compressed images.
arXiv Detail & Related papers (2025-12-24T02:59:01Z) - Plug-and-Play Versatile Compressed Video Enhancement [57.62582951699999]
Video compression effectively reduces the size of files, making it possible for real-time cloud computing.
However, it comes at the cost of visual quality, challenges the robustness of downstream vision models.
We present a versatile-aware enhancement framework that adaptively enhance videos under different compression settings.
arXiv Detail & Related papers (2025-04-21T18:39:31Z) - Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.
We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.
Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z) - REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder [52.698595889988766]
We present a novel perspective on learning video embedders for generative modeling.
Rather than requiring an exact reproduction of an input video, an effective embedder should focus on visually plausible reconstructions.
We propose replacing the conventional encoder-decoder video embedder with an encoder-generator framework.
arXiv Detail & Related papers (2025-03-11T17:51:07Z) - Exploiting Inter-Image Similarity Prior for Low-Bitrate Remote Sensing Image Compression [10.427300958330816]
We propose a codebook-based RS image compression (Code-RSIC) method with a generated discrete codebook.
The code significantly outperforms state-of-the-art traditional and learning-based image compression algorithms in terms of perception quality.
arXiv Detail & Related papers (2024-07-17T03:33:16Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Learned Multi-Resolution Variable-Rate Image Compression with
Octave-based Residual Blocks [15.308823742699039]
We propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv)
To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced.
Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
arXiv Detail & Related papers (2020-12-31T06:26:56Z) - How to Exploit the Transferability of Learned Image Compression to
Conventional Codecs [25.622863999901874]
We show how learned image coding can be used as a surrogate to optimize an image for encoding.
Our approach can remodel a conventional image to adjust for the MS-SSIM distortion with over 20% rate improvement without any decoding overhead.
arXiv Detail & Related papers (2020-12-03T12:34:51Z) - Enhanced Standard Compatible Image Compression Framework based on
Auxiliary Codec Networks [8.440333621142226]
We propose a novel standard compatible image compression framework based on Auxiliary Codec Networks (ACNs)
ACNs are designed to imitate image degradation operations of the existing, which delivers more accurate gradients to the compact representation network.
We demonstrate that our proposed framework based on JPEG and High Efficiency Video Coding (HEVC) standard substantially outperforms existing image compression algorithms in a standard compatible manner.
arXiv Detail & Related papers (2020-09-30T15:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.