Extreme Generative Image Compression by Learning Text Embedding from
Diffusion Models
- URL: http://arxiv.org/abs/2211.07793v1
- Date: Mon, 14 Nov 2022 22:54:19 GMT
- Title: Extreme Generative Image Compression by Learning Text Embedding from
Diffusion Models
- Authors: Zhihong Pan, Xin Zhou, Hao Tian
- Abstract summary: We propose a generative image compression method that demonstrates the potential of saving an image as a short text embedding.
Our method outperforms other state-of-the-art deep learning methods in terms of both perceptual quality and diversity.
- Score: 13.894251782142584
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transferring large amount of high resolution images over limited bandwidth is
an important but very challenging task. Compressing images using extremely low
bitrates (<0.1 bpp) has been studied but it often results in low quality images
of heavy artifacts due to the strong constraint in the number of bits available
for the compressed data. It is often said that a picture is worth a thousand
words but on the other hand, language is very powerful in capturing the essence
of an image using short descriptions. With the recent success of diffusion
models for text-to-image generation, we propose a generative image compression
method that demonstrates the potential of saving an image as a short text
embedding which in turn can be used to generate high-fidelity images which is
equivalent to the original one perceptually. For a given image, its
corresponding text embedding is learned using the same optimization process as
the text-to-image diffusion model itself, using a learnable text embedding as
input after bypassing the original transformer. The optimization is applied
together with a learning compression model to achieve extreme compression of
low bitrates <0.1 bpp. Based on our experiments measured by a comprehensive set
of image quality metrics, our method outperforms the other state-of-the-art
deep learning methods in terms of both perceptual quality and diversity.
Related papers
- Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior [8.772652777234315]
We propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models.
Our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely lows.
arXiv Detail & Related papers (2024-04-29T16:02:38Z) - Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity [18.469136842357095]
We develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity.
By doing so, we avoid decoding based on text-guided generative models.
Our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions.
arXiv Detail & Related papers (2024-03-05T13:15:01Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - Text + Sketch: Image Compression at Ultra Low Rates [22.771914148234103]
We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions.
Our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.
arXiv Detail & Related papers (2023-07-04T22:26:20Z) - You Can Mask More For Extremely Low-Bitrate Image Compression [80.7692466922499]
Learned image compression (LIC) methods have experienced significant progress during recent years.
LIC methods fail to explicitly explore the image structure and texture components crucial for image compression.
We present DA-Mask that samples visible patches based on the structure and texture of original images.
We propose a simple yet effective masked compression model (MCM), the first framework that unifies LIC and LIC end-to-end for extremely low-bitrate compression.
arXiv Detail & Related papers (2023-06-27T15:36:22Z) - Multi-Modality Deep Network for Extreme Learned Image Compression [31.532613540054697]
We propose a multimodal machine learning method for text-guided image compression, in which semantic information of text is used as prior information to guide image compression performance.
In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions.
arXiv Detail & Related papers (2023-04-26T14:22:59Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.
Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.
We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Quantization Guided JPEG Artifact Correction [69.04777875711646]
We develop a novel architecture for artifact correction using the JPEG files quantization matrix.
This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings.
arXiv Detail & Related papers (2020-04-17T00:10:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.