Fugu-MT 論文翻訳(概要): Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

論文の概要: Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

arxiv url: http://arxiv.org/abs/2604.08995v2
Date: Mon, 13 Apr 2026 03:48:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 14:47:45.814104
Title: Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Title（参考訳）: Matrix-Game 3.0: リアルタイムとストリーミングの対話型ワールドモデル
Authors: Zile Wang, Zexiang Liu, Jiaxing Li, Kaichen Huang, Baixin Xu, Fei Kang, Mengyin An, Peiyu Wang, Biao Jiang, Yichen Wei, Yidan Xietian, Jiangbo Pei, Liang Hu, Boyi Jiang, Hua Xue, Zidong Wang, Haofeng Sun, Wei Li, Wanli Ouyang, Xianglong He, Yang Liu, Yangguang Li, Yahui Zhou,
Abstract要約: Matrix-Game 3.0は、720pのリアルタイムビデオ生成用に設計されたメモリ拡張型インタラクティブワールドモデルである。データ、モデル、推論にまたがる体系的な改善を導入する。実験結果から, Matrix-Game 3.0は最大40FPSのリアルタイム生成を実現し, 5Bモデルで720pの解像度を実現した。
参考スコア（独自算出の注目度）: 53.39687409541093
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the advancement of interactive video generation, diffusion models have increasingly demonstrated their potential as world models. However, existing approaches still struggle to simultaneously achieve memory-enabled long-term temporal consistency and high-resolution real-time generation, limiting their applicability in real-world scenarios. To address this, we present Matrix-Game 3.0, a memory-augmented interactive world model designed for 720p real-time longform video generation. Building upon Matrix-Game 2.0, we introduce systematic improvements across data, model, and inference. First, we develop an upgraded industrial-scale infinite data engine that integrates Unreal Engine-based synthetic data, large-scale automated collection from AAA games, and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplet data at scale. Second, we propose a training framework for long-horizon consistency: by modeling prediction residuals and re-injecting imperfect generated frames during training, the base model learns self-correction; meanwhile, camera-aware memory retrieval and injection enable the base model to achieve long horizon spatiotemporal consistency. Third, we design a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder pruning, to achieve efficient real-time inference. Experimental results show that Matrix-Game 3.0 achieves up to 40 FPS real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequences. Scaling up to a 2x14B model further improves generation quality, dynamics, and generalization. Our approach provides a practical pathway toward industrial-scale deployable world models.
Abstract（参考訳）: インタラクティブなビデオ生成の進歩により、拡散モデルは世界モデルとしての可能性をますます示してきた。しかし、既存のアプローチは、メモリ対応の長期的一貫性と高解像度のリアルタイム生成を同時に実現し、現実のシナリオにおける適用性を制限している。そこで本稿では,リアルタイムビデオ生成のためのメモリ拡張型インタラクティブワールドモデルMatrix-Game 3.0を提案する。 Matrix-Game 2.0に基づいて、データ、モデル、推論の体系的な改善を紹介します。まず、Unreal Engineベースの合成データ、AAAゲームからの大規模自動収集、および高品質なビデオ-Pose-Action-Prompt四重項データを大規模に生成する実世界のビデオ拡張を統合した、産業規模の無限大データエンジンを開発する。第2に、予測残差をモデル化し、トレーニング中に生成した不完全なフレームを再注入することにより、ベースモデルは自己補正を学習し、一方、カメラ対応メモリ検索とインジェクションにより、ベースモデルは長い水平時空間一貫性を達成することができる。第3に, モデル量子化とVAEデコーダプルーニングを組み合わせた分散マッチング蒸留(DMD)に基づく多段自動回帰蒸留方式を設計し, 効率的なリアルタイム推論を実現する。実験結果から, Matrix-Game 3.0は最大40FPSのリアルタイム生成を実現し, 5Bモデルで720pの解像度を実現した。 2x14Bモデルにスケールアップすることで、生成品質、ダイナミクス、一般化がさらに向上する。我々のアプローチは、産業規模で展開可能な世界モデルへの実践的な経路を提供する。

論文の概要: Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

関連論文リスト