SMIC: Semantic Multi-Item Compression based on CLIP dictionary
- URL: http://arxiv.org/abs/2412.05035v1
- Date: Fri, 06 Dec 2024 13:39:36 GMT
- Title: SMIC: Semantic Multi-Item Compression based on CLIP dictionary
- Authors: Tom Bachard, Thomas Maugey,
- Abstract summary: Most recent semantic compression schemes rely on the foundation model CLIP.
We extend such a scheme to image collection compression, where inter-item redundancy is taken into account during the coding phase.
We also show that the learned dictionary is of a semantic nature and works as a semantic projector for the semantic content of images.
- Score: 9.17424462858218
- License:
- Abstract: Semantic compression, a compression scheme where the distortion metric, typically MSE, is replaced with semantic fidelity metrics, tends to become more and more popular. Most recent semantic compression schemes rely on the foundation model CLIP. In this work, we extend such a scheme to image collection compression, where inter-item redundancy is taken into account during the coding phase. For that purpose, we first show that CLIP's latent space allows for easy semantic additions and subtractions. From this property, we define a dictionary-based multi-item codec that outperforms state-of-the-art generative codec in terms of compression rate, around $10^{-5}$ BPP per image, while not sacrificing semantic fidelity. We also show that the learned dictionary is of a semantic nature and works as a semantic projector for the semantic content of images.
Related papers
- SQ-GAN: Semantic Image Communications Using Masked Vector Quantization [55.02795214161371]
This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach to optimize image compression for semantic/task-oriented communications.
SQ-GAN employs off-the-shelf semantic semantic segmentation and a new semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images.
arXiv Detail & Related papers (2025-02-13T17:35:57Z) - Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need [53.584140947828004]
Language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities.
We propose P$2$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies.
Experiments on benchmark datasets demonstrate that P$2$-LLM can beat SOTA classical and learned codecs.
arXiv Detail & Related papers (2024-11-19T12:15:40Z) - Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression [54.62883091552163]
Unsupervised video semantic compression (UVSC) has recently garnered attention.
We propose to boost the UVSC task by absorbing the off-the-shelf rich semantics from VFMs.
We introduce a VFMs-shared semantic alignment layer, complemented by VFM-specific prompts, to flexibly align semantics between the compressed video and various VFMs.
arXiv Detail & Related papers (2024-09-18T05:55:01Z) - Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - SMC++: Masked Learning of Unsupervised Video Semantic Compression [54.62883091552163]
We propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics.
MVM is proficient at learning generalizable semantics through the masked patch prediction task.
It may also encode non-semantic information like trivial textural details, wasting bitcost and bringing semantic noises.
arXiv Detail & Related papers (2024-06-07T09:06:40Z) - Crossword: A Semantic Approach to Data Compression via Masking [38.107509264270924]
This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further.
The proposed masking-based strategy resembles the above game.
In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer.
arXiv Detail & Related papers (2023-04-03T16:04:06Z) - Cross Modal Compression: Towards Human-comprehensible Semantic
Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data.
We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z) - Towards Semantic Communications: Deep Learning-Based Image Semantic
Coding [42.453963827153856]
We conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive.
We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level.
Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image.
arXiv Detail & Related papers (2022-08-08T12:29:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.