Related papers: One-Step Diffusion-Based Image Compression with Semantic Distillation

One-Step Diffusion-Based Image Compression with Semantic Distillation

URL: http://arxiv.org/abs/2505.16687v1
Date: Thu, 22 May 2025 13:54:09 GMT
Title: One-Step Diffusion-Based Image Compression with Semantic Distillation
Authors: Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu,
Abstract summary: OneDC is a One-step Diffusion-based generative image Codec.<n>OneDC achieves perceptual quality even with one-step generation.
Score: 25.910952778218146
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasing latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec -- that integrates a latent compression module with a one-step diffusion generator. Recognizing the critical role of semantic guidance in one-step diffusion, we propose using the hyperprior as a semantic signal, overcoming the limitations of text prompts in representing complex visual content. To further enhance the semantic capability of the hyperprior, we introduce a semantic distillation mechanism that transfers knowledge from a pretrained generative tokenizer to the hyperprior codec. Additionally, we adopt a hybrid pixel- and latent-domain optimization to jointly enhance both reconstruction fidelity and perceptual realism. Extensive experiments demonstrate that OneDC achieves SOTA perceptual quality even with one-step generation, offering over 40% bitrate reduction and 20x faster decoding compared to prior multi-step diffusion-based codecs. Code will be released later.

Related papers

YODA: Yet Another One-step Diffusion-based Video Compressor [55.356234617448905]
One-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited.<n>We present YYet-One-step Diffusion-based Video which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial correlations.<n>YODA achieves state-of-the-art perceptual performance, consistently outperforming deep-learning baselines on LPIPS, DISTS, FID, and KID.
arXiv Detail & Related papers (2026-01-03T10:12:07Z)
Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance [24.88807532823577]
We propose S2VC, a Single-Step diffusion based Video Codec that integrates a conditional coding framework with an efficient single-step diffusion generator.<n>We show that S2VC delivers state-of-the-art perceptual quality with an average 52.73% saving over prior perceptual methods.
arXiv Detail & Related papers (2025-12-08T12:05:30Z)
DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework [45.134271969594614]
We first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework.<n>We employ an End-to-End Finetuning strategy to improve overall compression performance.
arXiv Detail & Related papers (2025-08-11T06:59:23Z)
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression [36.10674664089876]
SODEC is a novel single-step diffusion-based image compression model.<n>It improves fidelity resulting from over-reliance on generative priors.<n>It significantly outperforms existing methods, achieving superior rate-distortion-perception performance.
arXiv Detail & Related papers (2025-08-07T02:24:03Z)
StableCodec: Taming One-Step Diffusion for Extreme Image Compression [19.69733852050049]
Diffusion-based image compression has shown remarkable potential for achieving ultra-low coding (less than 0.05 bits per pixel) with high realism.<n>Current approaches require a large number of denoising steps at the decoder to generate realistic results under extreme constraints.<n>We introduce StableCodec, which enables one-step diffusion for high-fidelity and high-realism extreme image compression.
arXiv Detail & Related papers (2025-06-27T07:39:21Z)
DiffO: Single-step Diffusion for Image Compression at Ultra-Low Bitrates [7.344746778324299]
We propose the first single step diffusion model for image compression (DiffO) that delivers high perceptual quality and fast decoding at ultra lows.<n>Experiments show that DiffO surpasses state the art compression performance while improving decoding speed by 50x compared to prior diffusion-based methods.
arXiv Detail & Related papers (2025-06-19T19:53:27Z)
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates [52.65036099944483]
Pretrained latent diffusion models have shown strong potential for lossy image compression.<n>Most existing methods reconstruct images by iteratively denoising from random noise.<n>We propose a one-step diffusion across multiple bit-rates termed OSCAR.
arXiv Detail & Related papers (2025-05-22T00:14:12Z)
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion [28.61304513668606]
ResULIC is a residual-guided ultra lowrate image compression system.<n>It incorporates residual signals into both semantic retrieval and the diffusion-based generation process.<n>It achieves superior objective and subjective performance compared to state-of-the-art diffusion-based methods.
arXiv Detail & Related papers (2025-05-13T06:51:23Z)
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z)
Epsilon-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.<n>Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.<n>We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
arXiv Detail & Related papers (2024-10-05T08:27:53Z)
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [66.63250537475973]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.<n>Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z)
Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos. Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z)
Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization. Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.