Fugu-MT 論文翻訳(概要): StableWorld: Towards Stable and Consistent Long Interactive Video Generation

論文の概要: StableWorld: Towards Stable and Consistent Long Interactive Video Generation

arxiv url: http://arxiv.org/abs/2601.15281v1
Date: Wed, 21 Jan 2026 18:59:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-22 21:27:50.507205
Title: StableWorld: Towards Stable and Consistent Long Interactive Video Generation
Title（参考訳）: StableWorld: 安定的で一貫性のある対話型ビデオ生成を目指す
Authors: Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Ziwei Liu, Chenyang Si,
Abstract要約: 対話型ビデオ生成における安定性と時間的一貫性の課題について検討する。本研究では,動的フレーム消去機構であるtextbfStableWorld を提案する。 StableWorldは、ソースへの累積ドリフトを効果的に防止し、インタラクティブな生成の安定性と時間的一貫性を向上する。
参考スコア（独自算出の注目度）: 45.597087309159456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we explore the overlooked challenge of stability and temporal consistency in interactive video generation, which synthesizes dynamic and controllable video worlds through interactive behaviors such as camera movements and text prompts. Despite remarkable progress in world modeling, current methods still suffer from severe instability and temporal degradation, often leading to spatial drift and scene collapse during long-horizon interactions. To better understand this issue, we initially investigate the underlying causes of instability and identify that the major source of error accumulation originates from the same scene, where generated frames gradually deviate from the initial clean state and propagate errors to subsequent frames. Building upon this observation, we propose a simple yet effective method, \textbf{StableWorld}, a Dynamic Frame Eviction Mechanism. By continuously filtering out degraded frames while retaining geometrically consistent ones, StableWorld effectively prevents cumulative drift at its source, leading to more stable and temporal consistency of interactive generation. Promising results on multiple interactive video models, \eg, Matrix-Game, Open-Oasis, and Hunyuan-GameCraft, demonstrate that StableWorld is model-agnostic and can be applied to different interactive video generation frameworks to substantially improve stability, temporal consistency, and generalization across diverse interactive scenarios.
Abstract（参考訳）: 本稿では、カメラの動きやテキストのプロンプトといったインタラクティブな動作を通じて、動的かつ制御可能な映像世界を合成するインタラクティブビデオ生成における安定性と時間的一貫性の難しさについて考察する。世界モデリングの顕著な進歩にもかかわらず、現在の手法は依然として深刻な不安定性と時間的劣化に悩まされており、長い水平相互作用の間、しばしば空間的なドリフトとシーンの崩壊を引き起こす。この問題をより深く理解するために,まずは不安定性の根本原因を解明し,生成したフレームが初期クリーン状態から徐々に逸脱し,後のフレームにエラーを伝播するという,エラー発生の主な原因が同じシーンに由来することを確認した。そこで本研究では, 動的フレーム消去機構である, シンプルで効果的な方法である \textbf{StableWorld} を提案する。幾何的に一貫性のあるフレームを維持しながら、劣化したフレームを継続的にフィルタリングすることにより、StableWorldはそのソースでの累積ドリフトを効果的に防止し、インタラクティブな生成の安定性と時間的一貫性を向上する。複数のインタラクティブなビデオモデルである \eg, Matrix-Game, Open-Oasis, Hunyuan-GameCraft では,StableWorld はモデルに依存しない上に,さまざまなインタラクティブなビデオ生成フレームワークに適用することで,さまざまなインタラクティブなシナリオにおける安定性,時間的一貫性,一般化を大幅に向上させることができる。

論文の概要: StableWorld: Towards Stable and Consistent Long Interactive Video Generation

関連論文リスト