Multi-Modality Deep Network for Extreme Learned Image Compression
- URL: http://arxiv.org/abs/2304.13583v1
- Date: Wed, 26 Apr 2023 14:22:59 GMT
- Title: Multi-Modality Deep Network for Extreme Learned Image Compression
- Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen
- Abstract summary: We propose a multimodal machine learning method for text-guided image compression, in which semantic information of text is used as prior information to guide image compression performance.
In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions.
- Score: 31.532613540054697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based single-modality compression learning approaches have demonstrated
exceptionally powerful encoding and decoding capabilities in the past few years
, but suffer from blur and severe semantics loss at extremely low bitrates. To
address this issue, we propose a multimodal machine learning method for
text-guided image compression, in which the semantic information of text is
used as prior information to guide image compression for better compression
performance. We fully study the role of text description in different
components of the codec, and demonstrate its effectiveness. In addition, we
adopt the image-text attention module and image-request complement module to
better fuse image and text features, and propose an improved multimodal
semantic-consistent loss to produce semantically complete reconstructions.
Extensive experiments, including a user study, prove that our method can obtain
visually pleasing results at extremely low bitrates, and achieves a comparable
or even better performance than state-of-the-art methods, even though these
methods are at 2x to 4x bitrates of ours.
Related papers
- Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity [18.469136842357095]
We develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity.
By doing so, we avoid decoding based on text-guided generative models.
Our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions.
arXiv Detail & Related papers (2024-03-05T13:15:01Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Multi-Modality Deep Network for JPEG Artifacts Reduction [33.02405073842042]
We propose a multimodal fusion learning method for text-guided JPEG artifacts reduction.
Our method can obtain better deblocking results compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T11:54:02Z) - Extreme Generative Image Compression by Learning Text Embedding from
Diffusion Models [13.894251782142584]
We propose a generative image compression method that demonstrates the potential of saving an image as a short text embedding.
Our method outperforms other state-of-the-art deep learning methods in terms of both perceptual quality and diversity.
arXiv Detail & Related papers (2022-11-14T22:54:19Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.