Text + Sketch: Image Compression at Ultra Low Rates
- URL: http://arxiv.org/abs/2307.01944v1
- Date: Tue, 4 Jul 2023 22:26:20 GMT
- Title: Text + Sketch: Image Compression at Ultra Low Rates
- Authors: Eric Lei, Yi\u{g}it Berkay Uslu, Hamed Hassani, Shirin Saeedi Bidokhti
- Abstract summary: We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions.
Our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.
- Score: 22.771914148234103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in text-to-image generative models provide the ability to
generate high-quality images from short text descriptions. These foundation
models, when pre-trained on billion-scale datasets, are effective for various
downstream tasks with little or no further training. A natural question to ask
is how such models may be adapted for image compression. We investigate several
techniques in which the pre-trained models can be directly used to implement
compression schemes targeting novel low rate regimes. We show how text
descriptions can be used in conjunction with side information to generate
high-fidelity reconstructions that preserve both semantics and spatial
structure of the original. We demonstrate that at very low bit-rates, our
method can significantly improve upon learned compressors in terms of
perceptual and semantic fidelity, despite no end-to-end training.
Related papers
- LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression [11.371756033920995]
This paper demonstrates that it is possible to generate captions and compress them within a single model.
We also propose a novel semantic-perceptual-oriented fine-tuning method applicable to any LIC network.
arXiv Detail & Related papers (2024-11-20T04:43:37Z) - A Training-Free Defense Framework for Robust Learned Image Compression [48.41990144764295]
We study the robustness of learned image compression models against adversarial attacks.
We present a training-free defense technique based on simple image transform functions.
arXiv Detail & Related papers (2024-01-22T12:50:21Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - Extreme Generative Image Compression by Learning Text Embedding from
Diffusion Models [13.894251782142584]
We propose a generative image compression method that demonstrates the potential of saving an image as a short text embedding.
Our method outperforms other state-of-the-art deep learning methods in terms of both perceptual quality and diversity.
arXiv Detail & Related papers (2022-11-14T22:54:19Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Lossless Compression with Latent Variable Models [4.289574109162585]
We use latent variable models, which we call 'bits back with asymmetric numeral systems' (BB-ANS)
The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data.
We describe 'Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models.
arXiv Detail & Related papers (2021-04-21T14:03:05Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.