Related papers: DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework

DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework

URL: http://arxiv.org/abs/2508.07682v1
Date: Mon, 11 Aug 2025 06:59:23 GMT
Title: DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework
Authors: Wenzhuo Ma, Zhenzhong Chen,
Abstract summary: We first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework.<n>We employ an End-to-End Finetuning strategy to improve overall compression performance.
Score: 45.134271969594614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework. Unlike conventional multi-step diffusion-based methods, DiffVC-OSD feeds the reconstructed latent representation directly into a One-Step Diffusion Model, enhancing perceptual quality through a single diffusion step guided by both temporal context and the latent itself. To better leverage temporal dependencies, we design a Temporal Context Adapter that encodes conditional inputs into multi-level features, offering more fine-grained guidance for the Denoising Unet. Additionally, we employ an End-to-End Finetuning strategy to improve overall compression performance. Extensive experiments demonstrate that DiffVC-OSD achieves state-of-the-art perceptual compression performance, offers about 20$\times$ faster decoding and a 86.92\% bitrate reduction compared to the corresponding multi-step diffusion-based variant.

Related papers

MTC-VAE: Multi-Level Temporal Compression with Content Awareness [54.85288415164888]
Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations.<n>We present a technique to convert fixed compression rate VAEs into models that support multi-level temporal compression.
arXiv Detail & Related papers (2026-02-01T17:08:02Z)
YODA: Yet Another One-step Diffusion-based Video Compressor [55.356234617448905]
One-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited.<n>We present YYet-One-step Diffusion-based Video which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial correlations.<n>YODA achieves state-of-the-art perceptual performance, consistently outperforming deep-learning baselines on LPIPS, DISTS, FID, and KID.
arXiv Detail & Related papers (2026-01-03T10:12:07Z)
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression [36.10674664089876]
SODEC is a novel single-step diffusion-based image compression model.<n>It improves fidelity resulting from over-reliance on generative priors.<n>It significantly outperforms existing methods, achieving superior rate-distortion-perception performance.
arXiv Detail & Related papers (2025-08-07T02:24:03Z)
One-Step Diffusion-Based Image Compression with Semantic Distillation [25.910952778218146]
OneDC is a One-step Diffusion-based generative image Codec.<n>OneDC achieves perceptual quality even with one-step generation.
arXiv Detail & Related papers (2025-05-22T13:54:09Z)
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates [52.65036099944483]
Pretrained latent diffusion models have shown strong potential for lossy image compression.<n>Most existing methods reconstruct images by iteratively denoising from random noise.<n>We propose a one-step diffusion across multiple bit-rates termed OSCAR.
arXiv Detail & Related papers (2025-05-22T00:14:12Z)
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation [95.60655992032316]
We introduce EDEN, an Enhanced Diffusion for high-quality large-motion vidEo frame iNterpolation.<n>Our approach first utilizes a transformer-based tokenizer to produce refined latent representations of the intermediate frames for diffusion models.<n>We then enhance the diffusion transformer with temporal attention across the process and incorporate a start-end frame difference embedding to guide the generation of dynamic motion.
arXiv Detail & Related papers (2025-03-20T03:54:52Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z)
Diffusion-based Perceptual Neural Video Compression with Temporal Diffusion Information Reuse [45.134271969594614]
DiffVC is a diffusion-based perceptual neural video compression framework.<n>It integrates foundational diffusion model with the video conditional coding paradigm.<n>We show that our proposed solution delivers excellent performance in both perception metrics and visual quality.
arXiv Detail & Related papers (2025-01-23T10:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.