MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
- URL: http://arxiv.org/abs/2402.16749v3
- Date: Wed, 17 Apr 2024 14:06:28 GMT
- Title: MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
- Authors: Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang,
- Abstract summary: This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
- Score: 78.4051835615796
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC.
Related papers
- CMC-Bench: Towards a New Paradigm of Visual Signal Compression [85.1839779884282]
We introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression.
At ultra-lows, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal protocols.
arXiv Detail & Related papers (2024-06-13T17:41:37Z) - UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation [59.3877309501938]
Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios.
We introduce a codebook containing frequency domain information as a prior input to the INR network.
This enhances the representational power of INR and provides distinctive conditioning for different image blocks.
arXiv Detail & Related papers (2024-05-27T05:52:13Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Cross Modal Compression: Towards Human-comprehensible Semantic
Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data.
We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z) - Image Compression with Encoder-Decoder Matched Semantic Segmentation [15.536056887418676]
layered image compression is a promising direction.
Some works transmit the semantic segment together with the compressed image data.
We propose a new layered image compression framework with encoder matched semantic segmentation (EDMS)
The proposed EDMS framework can get up to 35.31% BD-rate reduction over the HEVC-based (BPG) encoding time saving.
arXiv Detail & Related papers (2021-01-24T04:11:05Z) - How to Exploit the Transferability of Learned Image Compression to
Conventional Codecs [25.622863999901874]
We show how learned image coding can be used as a surrogate to optimize an image for encoding.
Our approach can remodel a conventional image to adjust for the MS-SSIM distortion with over 20% rate improvement without any decoding overhead.
arXiv Detail & Related papers (2020-12-03T12:34:51Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.