Fugu-MT 論文翻訳(概要): RealMaster: Lifting Rendered Scenes into Photorealistic Video

論文の概要: RealMaster: Lifting Rendered Scenes into Photorealistic Video

arxiv url: http://arxiv.org/abs/2603.23462v1
Date: Tue, 24 Mar 2026 17:32:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.613313
Title: RealMaster: Lifting Rendered Scenes into Photorealistic Video
Title（参考訳）: リアルマスター:レンダリングされたシーンをフォトリアリスティックなビデオにリフティング
Authors: Dana Cohen-Bar, Ido Sobol, Raphael Bensadoun, Shelly Sheynin, Oran Gafni, Or Patashnik, Daniel Cohen-Or, Amit Zohar,
Abstract要約: 最先端のビデオ生成モデルは驚くべきフォトリアリズムを生み出すが、生成したコンテンツをシーン要求に合わせるために必要な正確な制御は欠如している。本稿では,3Dエンジンの出力と完全な整合性を維持しつつ,映像拡散モデルを用いてレンダリング映像をフォトリアリスティックビデオに引き上げる手法であるRealMasterを提案する。 RealMasterは既存のビデオ編集のベースラインを大幅に上回り、ジオメトリ、ダイナミックス、アイデンティティを元の3Dコントロールで指定しながら、フォトリアリズムを改善している。
参考スコア（独自算出の注目度）: 55.04231137698114
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-of-the-art video generation models produce remarkable photorealism, but they lack the precise control required to align generated content with specific scene requirements. Furthermore, without an underlying explicit geometry, these models cannot guarantee 3D consistency. Conversely, 3D engines offer granular control over every scene element and provide native 3D consistency by design, yet their output often remains trapped in the "uncanny valley". Bridging this sim-to-real gap requires both structural precision, where the output must exactly preserve the geometry and dynamics of the input, and global semantic transformation, where materials, lighting, and textures must be holistically transformed to achieve photorealism. We present RealMaster, a method that leverages video diffusion models to lift rendered video into photorealistic video while maintaining full alignment with the output of the 3D engine. To train this model, we generate a paired dataset via an anchor-based propagation strategy, where the first and last frames are enhanced for realism and propagated across the intermediate frames using geometric conditioning cues. We then train an IC-LoRA on these paired videos to distill the high-quality outputs of the pipeline into a model that generalizes beyond the pipeline's constraints, handling objects and characters that appear mid-sequence and enabling inference without requiring anchor frames. Evaluated on complex GTA-V sequences, RealMaster significantly outperforms existing video editing baselines, improving photorealism while preserving the geometry, dynamics, and identity specified by the original 3D control.
Abstract（参考訳）: 最先端のビデオ生成モデルは驚くべきフォトリアリズムを生み出すが、生成したコンテンツを特定のシーン要求に合わせるために必要な正確な制御は欠如している。さらに、基礎となる明示的な幾何学がなければ、これらのモデルは3次元の整合性を保証することはできない。逆に、3Dエンジンはすべてのシーン要素を細かく制御し、設計によってネイティブな3D一貫性を提供するが、その出力は「不気味な谷」に閉じ込められていることが多い。このsim-to-realギャップを埋めるには、出力が入力の幾何学と力学を正確に保存しなければならない構造的精度と、材料、照明、テクスチャがフォトリアリズムを達成するために全体的変換されなければならない大域的意味変換の両方が必要である。本稿では,3Dエンジンの出力と完全な整合性を維持しつつ,映像拡散モデルを用いてレンダリング映像をフォトリアリスティックビデオに引き上げる手法であるRealMasterを提案する。このモデルをトレーニングするために、アンカーベースの伝搬戦略を用いてペア化されたデータセットを生成し、第1フレームと第2フレームをリアル性のために拡張し、幾何学的条件付きキューを用いて中間フレームに伝播する。次に、これらのペアビデオ上でIC-LoRAをトレーニングし、パイプラインの高品質な出力をパイプラインの制約を越えて一般化するモデルに抽出する。複雑なGTA-Vシーケンスに基づいて評価すると、RealMasterは既存のビデオ編集ベースラインを著しく上回り、ジオメトリ、ダイナミックス、アイデンティティを元の3Dコントロールで指定しながらフォトリアリズムを改善している。

論文の概要: RealMaster: Lifting Rendered Scenes into Photorealistic Video

関連論文リスト