Efficient Masked Image Compression with Position-Indexed Self-Attention
- URL: http://arxiv.org/abs/2504.12923v1
- Date: Thu, 17 Apr 2025 13:12:39 GMT
- Title: Efficient Masked Image Compression with Position-Indexed Self-Attention
- Authors: Chengjie Dai, Tiantian Song, Hui Tang, Fangdong Chen, Bowei Yang, Guanghua Song,
- Abstract summary: We propose an image compression method based on a position-indexed self-attention mechanism.<n>Compared to existing semantic-structured compression methods, our approach can significantly reduce computational costs.
- Score: 6.64044416324419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, image compression for high-level vision tasks has attracted considerable attention from researchers. Given that object information in images plays a far more crucial role in downstream tasks than background information, some studies have proposed semantically structuring the bitstream to selectively transmit and reconstruct only the information required by these tasks. However, such methods structure the bitstream after encoding, meaning that the coding process still relies on the entire image, even though much of the encoded information will not be transmitted. This leads to redundant computations. Traditional image compression methods require a two-dimensional image as input, and even if the unimportant regions of the image are set to zero by applying a semantic mask, these regions still participate in subsequent computations as part of the image. To address such limitations, we propose an image compression method based on a position-indexed self-attention mechanism that encodes and decodes only the visible parts of the masked image. Compared to existing semantic-structured compression methods, our approach can significantly reduce computational costs.
Related papers
- TFIC: End-to-End Text-Focused Image Compression for Coding for Machines [50.86328069558113]
We present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR)<n>Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity.
arXiv Detail & Related papers (2025-03-25T09:36:13Z) - Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression [7.643300240138419]
We introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities.<n>Our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information.<n>Our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely lows.
arXiv Detail & Related papers (2024-12-17T15:01:35Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - How to Exploit the Transferability of Learned Image Compression to
Conventional Codecs [25.622863999901874]
We show how learned image coding can be used as a surrogate to optimize an image for encoding.
Our approach can remodel a conventional image to adjust for the MS-SSIM distortion with over 20% rate improvement without any decoding overhead.
arXiv Detail & Related papers (2020-12-03T12:34:51Z) - Lossy Image Compression with Normalizing Flows [19.817005399746467]
State-of-the-art solutions for deep image compression typically employ autoencoders which map the input to a lower dimensional latent space.
In contrast, traditional approaches in image compression allow for a larger range of quality levels.
arXiv Detail & Related papers (2020-08-24T14:46:23Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency.
Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images.
Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.