Fugu-MT 論文翻訳(概要): Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models

論文の概要: Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models

arxiv url: http://arxiv.org/abs/2510.25420v1
Date: Wed, 29 Oct 2025 11:40:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 15:50:45.487787
Title: Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models
Title（参考訳）: ゼロショット画像ベース拡散モデルによる知覚ビデオ再生における時間的一貫性と推論時の忠実度の改善
Authors: Nasrin Rahimi, A. Murat Tekalp,
Abstract要約: ゼロショット画像ベース拡散モデルを用いたビデオ再生における時間的コヒーレンス向上の課題に対処する。本稿では,PSG(Perceptual Straightening Guidance)とMPES(Ensemble Sampling)の2つの補完的推論時間戦略を提案する。
参考スコア（独自算出の注目度）: 5.61537470581101
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have emerged as powerful priors for single-image restoration, but their application to zero-shot video restoration suffers from temporal inconsistencies due to the stochastic nature of sampling and complexity of incorporating explicit temporal modeling. In this work, we address the challenge of improving temporal coherence in video restoration using zero-shot image-based diffusion models without retraining or modifying their architecture. We propose two complementary inference-time strategies: (1) Perceptual Straightening Guidance (PSG) based on the neuroscience-inspired perceptual straightening hypothesis, which steers the diffusion denoising process towards smoother temporal evolution by incorporating a curvature penalty in a perceptual space to improve temporal perceptual scores, such as Fr\'echet Video Distance (FVD) and perceptual straightness; and (2) Multi-Path Ensemble Sampling (MPES), which aims at reducing stochastic variation by ensembling multiple diffusion trajectories to improve fidelity (distortion) scores, such as PSNR and SSIM, without sacrificing sharpness. Together, these training-free techniques provide a practical path toward temporally stable high-fidelity perceptual video restoration using large pretrained diffusion models. We performed extensive experiments over multiple datasets and degradation types, systematically evaluating each strategy to understand their strengths and limitations. Our results show that while PSG enhances temporal naturalness, particularly in case of temporal blur, MPES consistently improves fidelity and spatio-temporal perception--distortion trade-off across all tasks.
Abstract（参考訳）: 拡散モデルは単一画像復元の強力な先駆者として現れてきたが、そのゼロショットビデオ復元への応用は、サンプリングの確率的性質と明示的な時間的モデリングを取り入れた複雑さにより、時間的不整合に悩まされている。本研究では,ゼロショット画像ベース拡散モデルを用いた映像復元における時間的コヒーレンス向上の課題を,アーキテクチャの再トレーニングや修正を伴わずに解決する。神経科学にインスパイアされた知覚的ストレートニング仮説に基づく知覚的ストレートニングガイダンス(PSG)は,Fr\echet Video Distance(FVD)や知覚的ストレートネス(MPES)などの時間的知覚的スコアを改善するために,知覚空間に曲率ペナルティを組み込むことによって,よりスムーズな時間的進化に向けた拡散認知プロセスを促進する。これらのトレーニングフリーな手法は、大きな事前学習拡散モデルを用いて、時間的に安定な高忠実度知覚ビデオ再生に向けた実践的な道筋を提供する。我々は、複数のデータセットと分解タイプに対して広範な実験を行い、それぞれの戦略を体系的に評価し、その強みと限界を理解した。以上の結果から,PSGは時間的自然性,特に時間的曖昧性において向上するが,MPESは時間的不明瞭性や空間的知覚-ゆがみのトレードオフを常に改善することがわかった。

論文の概要: Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models

関連論文リスト