Multi-Modality Deep Network for Extreme Learned Image Compression
- URL: http://arxiv.org/abs/2304.13583v1
- Date: Wed, 26 Apr 2023 14:22:59 GMT
- Title: Multi-Modality Deep Network for Extreme Learned Image Compression
- Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen
- Abstract summary: We propose a multimodal machine learning method for text-guided image compression, in which semantic information of text is used as prior information to guide image compression performance.
In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions.
- Score: 31.532613540054697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based single-modality compression learning approaches have demonstrated
exceptionally powerful encoding and decoding capabilities in the past few years
, but suffer from blur and severe semantics loss at extremely low bitrates. To
address this issue, we propose a multimodal machine learning method for
text-guided image compression, in which the semantic information of text is
used as prior information to guide image compression for better compression
performance. We fully study the role of text description in different
components of the codec, and demonstrate its effectiveness. In addition, we
adopt the image-text attention module and image-request complement module to
better fuse image and text features, and propose an improved multimodal
semantic-consistent loss to produce semantically complete reconstructions.
Extensive experiments, including a user study, prove that our method can obtain
visually pleasing results at extremely low bitrates, and achieves a comparable
or even better performance than state-of-the-art methods, even though these
methods are at 2x to 4x bitrates of ours.
Related papers
- All-in-One Image Compression and Restoration [55.25638059492943]
We propose a unified framework for all-in-one image compression and restoration.
It incorporates the image restoration capability against various degradations into the process of image compression.
arXiv Detail & Related papers (2025-02-05T22:21:05Z) - CALLIC: Content Adaptive Learning for Lossless Image Compression [64.47244912937204]
CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression.
We propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations.
During encoding, we decompose pre-trained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT)
RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time.
arXiv Detail & Related papers (2024-12-23T10:41:18Z) - Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity [18.469136842357095]
We develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity.
By doing so, we avoid decoding based on text-guided generative models.
Our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions.
arXiv Detail & Related papers (2024-03-05T13:15:01Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Multi-Modality Deep Network for JPEG Artifacts Reduction [33.02405073842042]
We propose a multimodal fusion learning method for text-guided JPEG artifacts reduction.
Our method can obtain better deblocking results compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T11:54:02Z) - Extreme Generative Image Compression by Learning Text Embedding from
Diffusion Models [13.894251782142584]
We propose a generative image compression method that demonstrates the potential of saving an image as a short text embedding.
Our method outperforms other state-of-the-art deep learning methods in terms of both perceptual quality and diversity.
arXiv Detail & Related papers (2022-11-14T22:54:19Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.