Related papers: YODA: Yet Another One-step Diffusion-based Video Compressor

YODA: Yet Another One-step Diffusion-based Video Compressor

URL: http://arxiv.org/abs/2601.01141v1
Date: Sat, 03 Jan 2026 10:12:07 GMT
Title: YODA: Yet Another One-step Diffusion-based Video Compressor
Authors: Xingchen Li, Junzhe Zhang, Junqi Shi, Ming Lu, Zhan Ma,
Abstract summary: One-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited.<n>We present YYet-One-step Diffusion-based Video which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial correlations.<n>YODA achieves state-of-the-art perceptual performance, consistently outperforming deep-learning baselines on LPIPS, DISTS, FID, and KID.
Score: 55.356234617448905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While one-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited. Prior efforts typically rely on pretrained 2D autoencoders that generate per-frame latent representations independently, thereby neglecting temporal dependencies. We present YODA--Yet Another One-step Diffusion-based Video Compressor--which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial-temporal correlations for more compact representation, and employs a linear Diffusion Transformer (DiT) for efficient one-step denoising. YODA achieves state-of-the-art perceptual performance, consistently outperforming traditional and deep-learning baselines on LPIPS, DISTS, FID, and KID. Source code will be publicly available at https://github.com/NJUVISION/YODA.

Related papers

Generative Neural Video Compression via Video Diffusion Prior [33.164111717707414]
DiT-based generative neural video compression framework built upon advanced video generation foundation model.<n>First DiT-based generative neural video compression framework built upon advanced video generation foundation model.
arXiv Detail & Related papers (2025-12-04T17:27:32Z)
Uniform Discrete Diffusion with Metric Path for Video Generation [103.86033350602908]
Continuous-space video generation has advanced rapidly, while discrete approaches lag behind due to error accumulation and long-duration inconsistency.<n>We present Uniform generative modeling and present Uniform pAth (URSA), a powerful framework that bridges the gap with continuous approaches for scalable video generation.<n>URSA consistently outperforms existing discrete methods and achieves performance comparable to state-of-the-art continuous diffusion methods.
arXiv Detail & Related papers (2025-10-28T17:59:57Z)
DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework [45.134271969594614]
We first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework.<n>We employ an End-to-End Finetuning strategy to improve overall compression performance.
arXiv Detail & Related papers (2025-08-11T06:59:23Z)
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression [36.10674664089876]
SODEC is a novel single-step diffusion-based image compression model.<n>It improves fidelity resulting from over-reliance on generative priors.<n>It significantly outperforms existing methods, achieving superior rate-distortion-perception performance.
arXiv Detail & Related papers (2025-08-07T02:24:03Z)
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation [55.58089937219475]
We propose READ, the first real-time diffusion-transformer-based talking head generation framework.<n>Our approach first learns highly compressed video latent space via a VAE, significantly reducing the token count to speech generation.<n>We show that READ outperforms state-of-the-art methods by generating competitive talking head videos with significantly reduced runtime.
arXiv Detail & Related papers (2025-08-05T13:57:03Z)
One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models [65.96186414865747]
Text-to-Image (T2I) diffusion models face a trade-off between inference speed and image quality.<n>We introduce the first Time-independent Unified TiUE for the student model UNet architecture.<n>Using a one-pass scheme, TiUE shares encoder features across multiple decoder time steps, enabling parallel sampling.
arXiv Detail & Related papers (2025-05-28T04:23:22Z)
One-Step Diffusion-Based Image Compression with Semantic Distillation [25.910952778218146]
OneDC is a One-step Diffusion-based generative image Codec.<n>OneDC achieves perceptual quality even with one-step generation.
arXiv Detail & Related papers (2025-05-22T13:54:09Z)
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates [39.746866725267516]
Pretrained latent diffusion models have shown strong potential for lossy image compression.<n>We propose a one-step diffusion across multiple bit-rates termed OSCAR.<n>Experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics.
arXiv Detail & Related papers (2025-05-22T00:14:12Z)
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z)
Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network. PLVC learns to compress video towards good perceptual quality at low bit-rate. The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.