Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models
- URL: http://arxiv.org/abs/2510.25420v1
- Date: Wed, 29 Oct 2025 11:40:06 GMT
- Title: Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models
- Authors: Nasrin Rahimi, A. Murat Tekalp,
- Abstract summary: We address the challenge of improving temporal coherence in video restoration using zero-shot image-based diffusion models.<n>We propose two complementary inference-time strategies: Perceptual Straightening Guidance (PSG) and Ensemble Sampling (MPES)
- Score: 5.61537470581101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have emerged as powerful priors for single-image restoration, but their application to zero-shot video restoration suffers from temporal inconsistencies due to the stochastic nature of sampling and complexity of incorporating explicit temporal modeling. In this work, we address the challenge of improving temporal coherence in video restoration using zero-shot image-based diffusion models without retraining or modifying their architecture. We propose two complementary inference-time strategies: (1) Perceptual Straightening Guidance (PSG) based on the neuroscience-inspired perceptual straightening hypothesis, which steers the diffusion denoising process towards smoother temporal evolution by incorporating a curvature penalty in a perceptual space to improve temporal perceptual scores, such as Fr\'echet Video Distance (FVD) and perceptual straightness; and (2) Multi-Path Ensemble Sampling (MPES), which aims at reducing stochastic variation by ensembling multiple diffusion trajectories to improve fidelity (distortion) scores, such as PSNR and SSIM, without sacrificing sharpness. Together, these training-free techniques provide a practical path toward temporally stable high-fidelity perceptual video restoration using large pretrained diffusion models. We performed extensive experiments over multiple datasets and degradation types, systematically evaluating each strategy to understand their strengths and limitations. Our results show that while PSG enhances temporal naturalness, particularly in case of temporal blur, MPES consistently improves fidelity and spatio-temporal perception--distortion trade-off across all tasks.
Related papers
- D$^2$-VR: Degradation-Robust and Distilled Video Restoration with Synergistic Optimization Strategy [7.553742541566094]
integration of diffusion priors with temporal alignment has emerged as a transformative paradigm for video restoration, delivering fantastic perceptual quality.<n>We propose textbfD$2$-VR, a single-image diffusion-based video-restoration framework with low-step inference.
arXiv Detail & Related papers (2026-02-09T08:52:51Z) - LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration [3.2944592608677614]
We propose LVTINO, the first zero-shot or plug-and-play inverse solver for high definition video restoration with priors encoded by VCMs.<n>Our conditioning mechanism bypasses the need for automatic differentiation and achieves state-of-the-art video reconstruction quality with only a few neural function evaluations.
arXiv Detail & Related papers (2025-10-01T18:10:08Z) - Harnessing Diffusion-Yielded Score Priors for Image Restoration [29.788482710572307]
Deep image restoration models aim to learn a mapping from degraded image space to natural image space.<n>Three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods.<n>We propose a novel method, HYPIR, to address these challenges.
arXiv Detail & Related papers (2025-07-28T07:55:34Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Solving Video Inverse Problems Using Image Diffusion Models [58.464465016269614]
We introduce an innovative video inverse solver that leverages only image diffusion models.<n>Our method treats the time dimension of a video as the batch dimension image diffusion models.<n>We also introduce a batch-consistent sampling strategy that encourages consistency across batches.
arXiv Detail & Related papers (2024-09-04T09:48:27Z) - Zero-Shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model [15.170889156729777]
We propose the first framework for zero-shot video restoration and enhancement based on the pre-trained image diffusion model.<n>Our method is a plug-and-play module that can be inserted into any diffusion-based image restoration or enhancement methods.
arXiv Detail & Related papers (2024-07-02T05:31:59Z) - Deep Equilibrium Diffusion Restoration with Parallel Sampling [120.15039525209106]
Diffusion model-based image restoration (IR) aims to use diffusion models to recover high-quality (HQ) images from degraded images, achieving promising performance.
Most existing methods need long serial sampling chains to restore HQ images step-by-step, resulting in expensive sampling time and high computation costs.
In this work, we aim to rethink the diffusion model-based IR models through a different perspective, i.e., a deep equilibrium (DEQ) fixed point system, called DeqIR.
arXiv Detail & Related papers (2023-11-20T08:27:56Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Towards performant and reliable undersampled MR reconstruction via
diffusion model sampling [67.73698021297022]
DiffuseRecon is a novel diffusion model-based MR reconstruction method.
It guides the generation process based on the observed signals.
It does not require additional training on specific acceleration factors.
arXiv Detail & Related papers (2022-03-08T02:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.