Temporal-Consistent Video Restoration with Pre-trained Diffusion Models
- URL: http://arxiv.org/abs/2503.14863v1
- Date: Wed, 19 Mar 2025 03:41:56 GMT
- Title: Temporal-Consistent Video Restoration with Pre-trained Diffusion Models
- Authors: Hengkang Wang, Yang Liu, Huidong Liu, Chien-Chih Wang, Yanhui Guo, Hongdong Li, Bryan Wang, Ju Sun,
- Abstract summary: Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
- Score: 51.47188802535954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video restoration (VR) aims to recover high-quality videos from degraded ones. Although recent zero-shot VR methods using pre-trained diffusion models (DMs) show good promise, they suffer from approximation errors during reverse diffusion and insufficient temporal consistency. Moreover, dealing with 3D video data, VR is inherently computationally intensive. In this paper, we advocate viewing the reverse process in DMs as a function and present a novel Maximum a Posterior (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors. We also introduce strategies to promote bilevel temporal consistency: semantic consistency by leveraging clustering structures in the seed space, and pixel-level consistency by progressive warping with optical flow refinements. Extensive experiments on multiple virtual reality tasks demonstrate superior visual quality and temporal consistency achieved by our method compared to the state-of-the-art.
Related papers
- One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.
We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.
Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration [73.70209718408641]
SeedVR is a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution.<n>It achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos.
arXiv Detail & Related papers (2025-01-02T16:19:48Z) - Zero-Shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model [15.170889156729777]
We propose the first framework for zero-shot video restoration and enhancement based on the pre-trained image diffusion model.<n>Our method is a plug-and-play module that can be inserted into any diffusion-based image restoration or enhancement methods.
arXiv Detail & Related papers (2024-07-02T05:31:59Z) - Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information.
Inaccurate alignment usually leads to aligned features with significant artifacts.
propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z) - ConVRT: Consistent Video Restoration Through Turbulence with Test-time
Optimization of Neural Video Representations [13.38405890753946]
We introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT)
ConVRT is a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration.
A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision.
arXiv Detail & Related papers (2023-12-07T20:19:48Z) - Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z) - Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models [17.570136632211693]
We present StableVSR, a VSR method based on DMs that can enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details.
We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR.
arXiv Detail & Related papers (2023-11-27T15:14:38Z) - DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos [20.895221536570627]
Human mesh recovery (HMR) provides rich human body information for various real-world applications.<n>Video-based approaches leverage temporal information to mitigate this issue.<n>We present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR.
arXiv Detail & Related papers (2023-03-23T16:15:18Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.