Fugu-MT 論文翻訳(概要): Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

論文の概要: Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

arxiv url: http://arxiv.org/abs/2511.14719v1
Date: Tue, 18 Nov 2025 18:06:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:53.252137
Title: Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising
Title（参考訳）: 構造認識Denoisingによるゼロショット合成ビデオリアリズムの強化
Authors: Yifan Wang, Liya Ji, Zhanghan Ke, Harry Yang, Ser-Nam Lim, Qifeng Chen,
Abstract要約: 本稿では, シミュレータから合成ビデオを再レンダリングする合成ビデオリアリズムを, フォトリアリスティックな方法で拡張する手法を提案する。本フレームワークは,合成ビデオから空間的・時間的領域の強化ビデオへの多層構造保存に重点を置いている。
参考スコア（独自算出の注目度）: 83.09163795450407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose an approach to enhancing synthetic video realism, which can re-render synthetic videos from a simulator in photorealistic fashion. Our realism enhancement approach is a zero-shot framework that focuses on preserving the multi-level structures from synthetic videos into the enhanced one in both spatial and temporal domains, built upon a diffusion video foundational model without further fine-tuning. Specifically, we incorporate an effective modification to have the generation/denoising process conditioned on estimated structure-aware information from the synthetic video, such as depth maps, semantic maps, and edge maps, by an auxiliary model, rather than extracting the information from a simulator. This guidance ensures that the enhanced videos are consistent with the original synthetic video at both the structural and semantic levels. Our approach is a simple yet general and powerful approach to enhancing synthetic video realism: we show that our approach outperforms existing baselines in structural consistency with the original video while maintaining state-of-the-art photorealism quality in our experiments.
Abstract（参考訳）: 本稿では, シミュレータから合成ビデオを再レンダリングする合成ビデオリアリズムを, フォトリアリスティックな方法で拡張する手法を提案する。我々のリアリズム強化アプローチは、合成ビデオから空間的・時間的領域の強化ビデオへの多層構造保存に焦点を当てたゼロショットフレームワークであり、さらなる微調整をせずに拡散ビデオ基盤モデルに基づいて構築されている。具体的には,シミュレーションから情報を取り出すのではなく,デプスマップ,セマンティックマップ,エッジマップなどの合成ビデオから推定された構造認識情報に基づいて,効率的な情報生成/デノベーション処理を補助モデルに組み込む。このガイダンスにより、拡張されたビデオは、構造レベルと意味レベルの両方で、元の合成ビデオと一致していることが保証される。我々のアプローチは、合成ビデオリアリズムを強化するためのシンプルで汎用的で強力なアプローチであり、我々の実験では、最先端のフォトリアリズムの質を維持しながら、元のビデオと構造的整合性において既存のベースラインよりも優れていることを示す。

論文の概要: Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

関連論文リスト