Cross Modal Compression: Towards Human-comprehensible Semantic
Compression
- URL: http://arxiv.org/abs/2209.02574v1
- Date: Tue, 6 Sep 2022 15:31:11 GMT
- Title: Cross Modal Compression: Towards Human-comprehensible Semantic
Compression
- Authors: Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, Wen Gao
- Abstract summary: Cross modal compression is a semantic compression framework for visual data.
We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
- Score: 73.89616626853913
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traditional image/video compression aims to reduce the transmission/storage
cost with signal fidelity as high as possible. However, with the increasing
demand for machine analysis and semantic monitoring in recent years, semantic
fidelity rather than signal fidelity is becoming another emerging concern in
image/video compression. With the recent advances in cross modal translation
and generation, in this paper, we propose the cross modal compression~(CMC), a
semantic compression framework for visual data, to transform the high redundant
visual data~(such as image, video, etc.) into a compact, human-comprehensible
domain~(such as text, sketch, semantic map, attributions, etc.), while
preserving the semantic. Specifically, we first formulate the CMC problem as a
rate-distortion optimization problem. Secondly, we investigate the relationship
with the traditional image/video compression and the recent feature compression
frameworks, showing the difference between our CMC and these prior frameworks.
Then we propose a novel paradigm for CMC to demonstrate its effectiveness. The
qualitative and quantitative results show that our proposed CMC can achieve
encouraging reconstructed results with an ultrahigh compression ratio, showing
better compression performance than the widely used JPEG baseline.
Related papers
- Learned Image Compression for HE-stained Histopathological Images via Stain Deconvolution [33.69980388844034]
In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression.
We propose Stain Quantized Latent Compression, a novel DL based histopathology data compression approach.
We show that our approach yields superior performance in a classification downstream task, compared to traditional approaches like JPEG.
arXiv Detail & Related papers (2024-06-18T13:47:17Z) - SMC++: Masked Learning of Unsupervised Video Semantic Compression [54.62883091552163]
We propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics.
MVM is proficient at learning generalizable semantics through the masked patch prediction task.
It may also encode non-semantic information like trivial textural details, wasting bitcost and bringing semantic noises.
arXiv Detail & Related papers (2024-06-07T09:06:40Z) - Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression.
A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization.
Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Extreme Image Compression using Fine-tuned VQGANs [43.43014096929809]
We introduce vector quantization (VQ)-based generative models into the image compression domain.
The codebook learned by the VQGAN model yields a strong expressive capacity.
The proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics.
arXiv Detail & Related papers (2023-07-17T06:14:19Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - A Unified Image Preprocessing Framework For Image Compression [5.813935823171752]
We propose a unified image compression preprocessing framework, called Kuchen, to improve the performance of existing codecs.
The framework consists of a hybrid data labeling system along with a learning-based backbone to simulate personalized preprocessing.
Results demonstrate that the modern codecs optimized by our unified preprocessing framework constantly improve the efficiency of the state-of-the-art compression.
arXiv Detail & Related papers (2022-08-15T10:41:00Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Enhanced Invertible Encoding for Learned Image Compression [40.21904131503064]
In this paper, we propose an enhanced Invertible.
Network with invertible neural networks (INNs) to largely mitigate the information loss problem for better compression.
Experimental results on the Kodak, CLIC, and Tecnick datasets show that our method outperforms the existing learned image compression methods.
arXiv Detail & Related papers (2021-08-08T17:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.