CoD: A Diffusion Foundation Model for Image Compression
- URL: http://arxiv.org/abs/2511.18706v1
- Date: Mon, 24 Nov 2025 03:00:15 GMT
- Title: CoD: A Diffusion Foundation Model for Image Compression
- Authors: Zhaoyang Jia, Zihan Zheng, Naifu Xue, Jiahao Li, Bin Li, Zongyu Guo, Xiaoyi Zhang, Houqiang Li, Yan Lu,
- Abstract summary: Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion.<n>textbfCoD can be trained from scratch to enable end-to-end optimization of both compression and generation.
- Score: 57.572664625372106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.
Related papers
- YODA: Yet Another One-step Diffusion-based Video Compressor [55.356234617448905]
One-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited.<n>We present YYet-One-step Diffusion-based Video which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial correlations.<n>YODA achieves state-of-the-art perceptual performance, consistently outperforming deep-learning baselines on LPIPS, DISTS, FID, and KID.
arXiv Detail & Related papers (2026-01-03T10:12:07Z) - DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation [93.6273078684831]
We propose a frequency-DeCoupled pixel diffusion framework to pursue a more efficient pixel diffusion paradigm.<n>With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance.<n>Experiments show that DeCo achieves superior performance among pixel diffusion models, attaining FID of 1.62 (256x256) and 2.22 (512x512) on ImageNet.
arXiv Detail & Related papers (2025-11-24T17:59:06Z) - Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression [36.10674664089876]
SODEC is a novel single-step diffusion-based image compression model.<n>It improves fidelity resulting from over-reliance on generative priors.<n>It significantly outperforms existing methods, achieving superior rate-distortion-perception performance.
arXiv Detail & Related papers (2025-08-07T02:24:03Z) - StableCodec: Taming One-Step Diffusion for Extreme Image Compression [19.69733852050049]
Diffusion-based image compression has shown remarkable potential for achieving ultra-low coding (less than 0.05 bits per pixel) with high realism.<n>Current approaches require a large number of denoising steps at the decoder to generate realistic results under extreme constraints.<n>We introduce StableCodec, which enables one-step diffusion for high-fidelity and high-realism extreme image compression.
arXiv Detail & Related papers (2025-06-27T07:39:21Z) - Single-step Diffusion for Image Compression at Ultra-Low Bitrates [19.76457078979179]
We propose a single-step diffusion model for image compression that delivers high perceptual quality and fast decoding at ultra-lows.<n>Our approach incorporates two key innovations: (i) Vector-Quantized Residual (VQ-Residual) training, which factorizes a structural base code and a learned residual in latent space.<n>Ours achieves comparable compression performance to state-of-the-art methods while improving decoding speed by about 50x.
arXiv Detail & Related papers (2025-06-19T19:53:27Z) - OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates [39.746866725267516]
Pretrained latent diffusion models have shown strong potential for lossy image compression.<n>We propose a one-step diffusion across multiple bit-rates termed OSCAR.<n>Experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics.
arXiv Detail & Related papers (2025-05-22T00:14:12Z) - Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.<n>We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.<n>We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z) - Lossy Compression with Pretrained Diffusion Models [4.673285689826945]
A principled algorithm for lossy compression using pretrained diffusion models has been understood since at least Ho et al. 2020.<n>We introduce simple workarounds that lead to the first complete implementation of DiffC.<n>Despite requiring no additional training, our method is competitive with other state-of-the-art generative compression methods at low ultra-lows.
arXiv Detail & Related papers (2025-01-16T20:02:13Z) - PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher [55.22994720855957]
PaGoDA is a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution.
With the proposed pipeline, PaGoDA achieves a $64times$ reduced cost in training its diffusion model on 8x downsampled data.
PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models.
arXiv Detail & Related papers (2024-05-23T17:39:09Z) - Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior [8.772652777234315]
We propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models.
Our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely lows.
arXiv Detail & Related papers (2024-04-29T16:02:38Z) - Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder [49.01721042973929]
This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction.
Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.
arXiv Detail & Related papers (2024-04-07T10:57:54Z) - Extreme Video Compression with Pre-trained Diffusion Models [11.898317376595697]
We present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder.
The entire video is sequentially encoded to achieve a visually pleasing reconstruction, considering perceptual quality metrics.
Results showcase the potential of exploiting the temporal relations in video data using generative models.
arXiv Detail & Related papers (2024-02-14T04:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.