Fugu-MT 論文翻訳(概要): I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation

論文の概要: I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation

arxiv url: http://arxiv.org/abs/2603.23413v1
Date: Tue, 24 Mar 2026 16:45:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.591112
Title: I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation
Title（参考訳）: I3DM: Consistent Video Scene Generationのためのインプット3D対応メモリ検索とインジェクション
Authors: Jia Li, Han Yan, Yihang Chen, Siqi Li, Xibin Song, Yifu Wang, Jianfei Cai, Tien-Tsin Wong, Pan Ji,
Abstract要約: I3DMは、一貫した映像シーン生成のための暗黙的な3D対応メモリ機構である。われわれのアプローチの核心は3D対応メモリ検索戦略である。検索した履歴フレームをフル活用するために,3次元メモリインジェクションモジュールを導入する。
参考スコア（独自算出の注目度）: 56.33710337846449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite remarkable progress in video generation, maintaining long-term scene consistency upon revisiting previously explored areas remains challenging. Existing solutions rely either on explicitly constructing 3D geometry, which suffers from error accumulation and scale ambiguity, or on naive camera Field-of-View (FoV) retrieval, which typically fails under complex occlusions. To overcome these limitations, we propose I3DM, a novel implicit 3D-aware memory mechanism for consistent video scene generation that bypasses explicit 3D reconstruction. At the core of our approach is a 3D-aware memory retrieval strategy, which leverages the intermediate features of a pre-trained Feed-Forward Novel View Synthesis (FF-NVS) model to score view relevance, enabling robust retrieval even in highly occluded scenarios. Furthermore, to fully utilize the retrieved historical frames, we introduce a 3D-aligned memory injection module. This module implicitly warps historical content to the target view and adaptively conditions the generation on reliable warping regions, leading to improved revisit consistency and accurate camera control. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches, achieving superior revisit consistency, generation fidelity, and camera control precision.
Abstract（参考訳）: ビデオ生成の著しい進歩にもかかわらず、これまで調査された領域を再検討する上で、長期的なシーンの一貫性を維持することは依然として困難である。既存のソリューションは、エラーの蓄積とスケールの曖昧さに苦しむ3D幾何を明示的に構築するか、または複雑な閉塞の下で失敗するカメラのFoV(Field-of-View)検索に頼っている。これらの制約を克服するため、我々は、明示的な3D再構成を回避した一貫した映像シーン生成のための、新しい暗黙の3D対応メモリ機構であるI3DMを提案する。提案手法のコアとなる3Dメモリ検索戦略は、事前学習されたフィードフォワードノベルビュー合成(FF-NVS)モデルの中間機能を活用して、高度に隠蔽されたシナリオにおいてもロバストな検索を可能にする。さらに, 検索した履歴フレームをフル活用するために, 3次元メモリ注入モジュールを導入する。このモジュールは、歴史的コンテンツを対象のビューに暗黙的にワープし、信頼性の高いワープ領域の生成を適応的に条件付けすることで、再確認一貫性と正確なカメラ制御が改善される。大規模な実験により,本手法は最先端の手法よりも優れ,より優れた再確認整合性,生成精度,カメラ制御精度を実現していることが示された。

論文の概要: I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation

関連論文リスト