Related papers: CoD: A Diffusion Foundation Model for Image Compression

CoD: A Diffusion Foundation Model for Image Compression

URL: http://arxiv.org/abs/2511.18706v1
Date: Mon, 24 Nov 2025 03:00:15 GMT
Title: CoD: A Diffusion Foundation Model for Image Compression
Authors: Zhaoyang Jia, Zihan Zheng, Naifu Xue, Jiahao Li, Bin Li, Zongyu Guo, Xiaoyi Zhang, Houqiang Li, Yan Lu,
Abstract summary: Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion.<n>textbfCoD can be trained from scratch to enable end-to-end optimization of both compression and generation.
Score: 57.572664625372106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.

Related papers

YODA: Yet Another One-step Diffusion-based Video Compressor [55.356234617448905]
One-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited.<n>We present YYet-One-step Diffusion-based Video which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial correlations.<n>YODA achieves state-of-the-art perceptual performance, consistently outperforming deep-learning baselines on LPIPS, DISTS, FID, and KID.
arXiv Detail & Related papers (2026-01-03T10:12:07Z)
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation [93.6273078684831]
We propose a frequency-DeCoupled pixel diffusion framework to pursue a more efficient pixel diffusion paradigm.<n>With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance.<n>Experiments show that DeCo achieves superior performance among pixel diffusion models, attaining FID of 1.62 (256x256) and 2.22 (512x512) on ImageNet.
arXiv Detail & Related papers (2025-11-24T17:59:06Z)
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression [36.10674664089876]
SODEC is a novel single-step diffusion-based image compression model.<n>It improves fidelity resulting from over-reliance on generative priors.<n>It significantly outperforms existing methods, achieving superior rate-distortion-perception performance.
arXiv Detail & Related papers (2025-08-07T02:24:03Z)
StableCodec: Taming One-Step Diffusion for Extreme Image Compression [19.69733852050049]
Diffusion-based image compression has shown remarkable potential for achieving ultra-low coding (less than 0.05 bits per pixel) with high realism.<n>Current approaches require a large number of denoising steps at the decoder to generate realistic results under extreme constraints.<n>We introduce StableCodec, which enables one-step diffusion for high-fidelity and high-realism extreme image compression.
arXiv Detail & Related papers (2025-06-27T07:39:21Z)
Single-step Diffusion for Image Compression at Ultra-Low Bitrates [19.76457078979179]
We propose a single-step diffusion model for image compression that delivers high perceptual quality and fast decoding at ultra-lows.<n>Our approach incorporates two key innovations: (i) Vector-Quantized Residual (VQ-Residual) training, which factorizes a structural base code and a learned residual in latent space.<n>Ours achieves comparable compression performance to state-of-the-art methods while improving decoding speed by about 50x.
arXiv Detail & Related papers (2025-06-19T19:53:27Z)
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates [39.746866725267516]
Pretrained latent diffusion models have shown strong potential for lossy image compression.<n>We propose a one-step diffusion across multiple bit-rates termed OSCAR.<n>Experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics.
arXiv Detail & Related papers (2025-05-22T00:14:12Z)
Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.<n>We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.<n>We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z)
Lossy Compression with Pretrained Diffusion Models [4.673285689826945]
A principled algorithm for lossy compression using pretrained diffusion models has been understood since at least Ho et al. 2020.<n>We introduce simple workarounds that lead to the first complete implementation of DiffC.<n>Despite requiring no additional training, our method is competitive with other state-of-the-art generative compression methods at low ultra-lows.
arXiv Detail & Related papers (2025-01-16T20:02:13Z)
PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher [55.22994720855957]
PaGoDA is a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64times$ reduced cost in training its diffusion model on 8x downsampled data. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models.
arXiv Detail & Related papers (2024-05-23T17:39:09Z)
Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior [8.772652777234315]
We propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models. Our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely lows.
arXiv Detail & Related papers (2024-04-29T16:02:38Z)
Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder [49.01721042973929]
This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction. Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.
arXiv Detail & Related papers (2024-04-07T10:57:54Z)
Extreme Video Compression with Pre-trained Diffusion Models [11.898317376595697]
We present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The entire video is sequentially encoded to achieve a visually pleasing reconstruction, considering perceptual quality metrics. Results showcase the potential of exploiting the temporal relations in video data using generative models.
arXiv Detail & Related papers (2024-02-14T04:23:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.