MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
- URL: http://arxiv.org/abs/2402.16749v3
- Date: Wed, 17 Apr 2024 14:06:28 GMT
- Title: MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
- Authors: Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang,
- Abstract summary: This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
- Score: 78.4051835615796
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC.
Related papers
- Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates [47.47031054057152]
Generative models have been explored to compress RS images into extremely low-bitrate streams.
These generative models struggle to reconstruct visually plausible images due to the highly ill-posed nature of extremely low-bitrate image compression.
We propose an image compression framework that utilizes a pre-trained diffusion model with powerful natural image priors to achieve high-realism reconstructions.
arXiv Detail & Related papers (2024-09-03T14:29:54Z) - Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - CMC-Bench: Towards a New Paradigm of Visual Signal Compression [85.1839779884282]
We introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression.
At ultra-lows, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal protocols.
arXiv Detail & Related papers (2024-06-13T17:41:37Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Cross Modal Compression: Towards Human-comprehensible Semantic
Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data.
We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z) - Image Compression with Encoder-Decoder Matched Semantic Segmentation [15.536056887418676]
layered image compression is a promising direction.
Some works transmit the semantic segment together with the compressed image data.
We propose a new layered image compression framework with encoder matched semantic segmentation (EDMS)
The proposed EDMS framework can get up to 35.31% BD-rate reduction over the HEVC-based (BPG) encoding time saving.
arXiv Detail & Related papers (2021-01-24T04:11:05Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.