Generative Neural Video Compression via Video Diffusion Prior
- URL: http://arxiv.org/abs/2512.05016v1
- Date: Thu, 04 Dec 2025 17:27:32 GMT
- Title: Generative Neural Video Compression via Video Diffusion Prior
- Authors: Qi Mao, Hao Cheng, Tinghan Yang, Libiao Jin, Siwei Ma,
- Abstract summary: DiT-based generative neural video compression framework built upon advanced video generation foundation model.<n>First DiT-based generative neural video compression framework built upon advanced video generation foundation model.
- Score: 33.164111717707414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present GNVC-VD, the first DiT-based generative neural video compression framework built upon an advanced video generation foundation model, where spatio-temporal latent compression and sequence-level generative refinement are unified within a single codec. Existing perceptual codecs primarily rely on pre-trained image generative priors to restore high-frequency details, but their frame-wise nature lacks temporal modeling and inevitably leads to perceptual flickering. To address this, GNVC-VD introduces a unified flow-matching latent refinement module that leverages a video diffusion transformer to jointly enhance intra- and inter-frame latents through sequence-level denoising, ensuring consistent spatio-temporal details. Instead of denoising from pure Gaussian noise as in video generation, GNVC-VD initializes refinement from decoded spatio-temporal latents and learns a correction term that adapts the diffusion prior to compression-induced degradation. A conditioning adaptor further injects compression-aware cues into intermediate DiT layers, enabling effective artifact removal while maintaining temporal coherence under extreme bitrate constraints. Extensive experiments show that GNVC-VD surpasses both traditional and learned codecs in perceptual quality and significantly reduces the flickering artifacts that persist in prior generative approaches, even below 0.01 bpp, highlighting the promise of integrating video-native generative priors into neural codecs for next-generation perceptual video compression.
Related papers
- Free-GVC: Towards Training-Free Extreme Generative Video Compression with Temporal Coherence [30.812937732503457]
Free-GVC is a training-free generative video compression framework.<n>Our method operates at the group-of-pictures level, encoding video segments into a compact latent space.<n>Experiments show that Free-GVC achieves an average of 93.29% BD-Rate reduction in DISTS over the latest neural DCVC-RT.
arXiv Detail & Related papers (2026-02-10T15:12:51Z) - YODA: Yet Another One-step Diffusion-based Video Compressor [55.356234617448905]
One-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited.<n>We present YYet-One-step Diffusion-based Video which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial correlations.<n>YODA achieves state-of-the-art perceptual performance, consistently outperforming deep-learning baselines on LPIPS, DISTS, FID, and KID.
arXiv Detail & Related papers (2026-01-03T10:12:07Z) - Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models [11.913945404405865]
Most video diffusion models (VDMs) generate videos in an autoregressive manner, generating subsequent iteration frames conditioned on previous ones.<n>We propose Adaptive Begin-of-Video Tokens (ada-BOV) for autoregressive VDMs.
arXiv Detail & Related papers (2025-11-15T08:29:14Z) - Generative Latent Video Compression [26.99743586846841]
We present Generative Latent Video Compression (GLVC), an effective framework for perceptual video compression.<n>GLVC employs a pretrained continuous tokenizer to project video frames into a perceptually aligned latent space.<n>We show GLVC achieves state-of-the-art performance in terms of DISTS and LPIPS metrics.
arXiv Detail & Related papers (2025-10-11T03:28:49Z) - Nuclear Diffusion Models for Low-Rank Background Suppression in Videos [20.045809197071204]
Nuclear Diffusion is evaluated on a real-world medical imaging problem, namely cardiac ultrasound dehazing.<n>Results highlight the potential of combining model-based temporal models with deep generative priors for high-fidelity video restoration.
arXiv Detail & Related papers (2025-09-25T08:20:22Z) - DiTVR: Zero-Shot Diffusion Transformer for Video Restoration [48.97196894658511]
DiTVR is a zero shot video restoration framework that couples a diffusion transformer with trajectory aware attention and a flow consistent sampler.<n>Our attention mechanism aligns tokens along optical flow trajectories, with particular emphasis on vital layers that exhibit the highest sensitivity to temporal dynamics.<n>The flow guided sampler injects data consistency only into low-frequency bands, preserving high frequency priors while accelerating cache.
arXiv Detail & Related papers (2025-08-11T09:54:45Z) - Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model [55.2480439325792]
We propose a hybrid compression scheme optimized for perceptual quality, extending the approach of the CDC model with a decoder network.<n>We achieve up to +2dB PSNR fidelity improvements while maintaining comparable LPIPS and FID perceptual scores when compared with CDC.
arXiv Detail & Related papers (2025-05-19T14:13:14Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Gated Recurrent Unit for Video Denoising [5.515903319513226]
We propose a new video denoising model based on gated recurrent unit (GRU) mechanisms for video denoising.
The experimental results show that the GRU-VD network can achieve better quality than state of the arts objectively and subjectively.
arXiv Detail & Related papers (2022-10-17T14:34:54Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.