Related papers: Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder

Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder

URL: http://arxiv.org/abs/2404.04916v2
Date: Thu, 2 May 2024 13:37:13 GMT
Title: Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder
Authors: Yiyang Ma, Wenhan Yang, Jiaying Liu,
Abstract summary: This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction. Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.
Score: 49.01721042973929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The images produced by diffusion models can attain excellent perceptual quality. However, it is challenging for diffusion models to guarantee distortion, hence the integration of diffusion models and image compression models still needs more comprehensive explorations. This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction, which achieves better perceptual quality while guaranteeing the distortion to an extent. We build a diffusion model and design a novel paradigm that combines the diffusion model and an end-to-end decoder, and the latter is responsible for transmitting the privileged information extracted at the encoder side. Specifically, we theoretically analyze the reconstruction process of the diffusion models at the encoder side with the original images being visible. Based on the analysis, we introduce an end-to-end convolutional decoder to provide a better approximation of the score function $\nabla_{\mathbf{x}_t}\log p(\mathbf{x}_t)$ at the encoder side and effectively transmit the combination. Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.

Related papers

Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model [55.2480439325792]
We propose a hybrid compression scheme optimized for perceptual quality, extending the approach of the CDC model with a decoder network.<n>We achieve up to +2dB PSNR fidelity improvements while maintaining comparable LPIPS and FID perceptual scores when compared with CDC.
arXiv Detail & Related papers (2025-05-19T14:13:14Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder [52.698595889988766]
We present a novel perspective on learning video embedders for generative modeling. Rather than requiring an exact reproduction of an input video, an effective embedder should focus on visually plausible reconstructions. We propose replacing the conventional encoder-decoder video embedder with an encoder-generator framework.
arXiv Detail & Related papers (2025-03-11T17:51:07Z)
Sample what you cant compress [6.24979299238534]
We show how to learn a continuous encoder and decoder under a diffusion-based loss. This approach yields better reconstruction quality as compared to GAN-based autoencoders. We also show that the resulting representation is easier to model with a latent diffusion model as compared to the representation obtained from a state-of-the-art GAN-based loss.
arXiv Detail & Related papers (2024-09-04T08:42:42Z)
Zero-Shot Image Compression with Diffusion-Based Posterior Sampling [34.50287066865267]
This work addresses the gap by harnessing the image prior learned by existing pre-trained diffusion models for solving the task of lossy image compression. Our method, PSC (Posterior Sampling-based Compression), utilizes zero-shot diffusion-based posterior samplers. PSC achieves competitive results compared to established methods, paving the way for further exploration of pre-trained diffusion models and posterior samplers for image compression.
arXiv Detail & Related papers (2024-07-13T14:24:22Z)
Lossy Image Compression with Foundation Diffusion Models [10.407650300093923]
In this work we formulate the removal of quantization error as a denoising task, using diffusion to recover lost information in the transmitted image latent. Our approach allows us to perform less than 10% of the full diffusion generative process and requires no architectural changes to the diffusion model.
arXiv Detail & Related papers (2024-04-12T16:23:42Z)
Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders [7.485128109817576]
We show that conditional diffusion models can lead to promising results in the generative compression task when used as a decoder. In this paper, we show that conditional diffusion models can lead to promising results in the generative compression task when used as a decoder.
arXiv Detail & Related papers (2024-03-05T11:48:35Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE) We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)
Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z)
Lossy Compression with Gaussian Diffusion [28.930398810600504]
We describe a novel lossy compression approach called DiffC which is based on unconditional diffusion generative models. We implement a proof of concept and find that it works surprisingly well despite the lack of an encoder transform. We show that a flow-based reconstruction achieves a 3 dB gain over ancestral sampling at highs.
arXiv Detail & Related papers (2022-06-17T16:46:31Z)
Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images. We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.