CMC-Bench: Towards a New Paradigm of Visual Signal Compression
- URL: http://arxiv.org/abs/2406.09356v1
- Date: Thu, 13 Jun 2024 17:41:37 GMT
- Title: CMC-Bench: Towards a New Paradigm of Visual Signal Compression
- Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin,
- Abstract summary: We introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression.
At ultra-lows, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal protocols.
- Score: 85.1839779884282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.
Related papers
- Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Cross Modal Compression: Towards Human-comprehensible Semantic
Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data.
We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z) - PILC: Practical Image Lossless Compression with an End-to-end GPU
Oriented Neural Framework [88.18310777246735]
We propose an end-to-end image compression framework that achieves 200 MB/s for both compression and decompression with a single NVIDIA Tesla V100 GPU.
Experiments show that our framework compresses better than PNG by a margin of 30% in multiple datasets.
arXiv Detail & Related papers (2022-06-10T03:00:10Z) - Split Hierarchical Variational Compression [21.474095984110622]
Variational autoencoders (VAEs) have witnessed great success in performing the compression of image datasets.
SHVC introduces an efficient autoregressive sub-pixel convolution, that allows a generalisation between per-pixel autoregressions and fully factorised probability models.
arXiv Detail & Related papers (2022-04-05T09:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.