Fugu-MT 論文翻訳(概要): Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

論文の概要: Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

arxiv url: http://arxiv.org/abs/2508.13009v1
Date: Mon, 18 Aug 2025 15:28:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.451863
Title: Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Title（参考訳）: Matrix-Game 2.0 - オープンソース、リアルタイム、インタラクティブな世界モデル
Authors: Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Cyrus Wu, Wei Li, Xuchen Song, Yang Liu, Eric Li, Yahui Zhou,
Abstract要約: Matrix-Game 2.0はインタラクティブな世界モデルで、数ステップの自己回帰拡散を通じて長時間の動画をオンザフライで生成する。超高速25FPSで、さまざまなシーンで高品質のミニレベルビデオを生成することができる。
参考スコア（独自算出の注目度）: 15.16063778402193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectional attention and lengthy inference steps, severely limiting real-time performance. Consequently, they are hard to simulate real-world dynamics, where outcomes must update instantaneously based on historical context and current actions. To address this, we present Matrix-Game 2.0, an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion. Our framework consists of three key components: (1) A scalable data production pipeline for Unreal Engine and GTA5 environments to effectively produce massive amounts (about 1200 hours) of video data with diverse interaction annotations; (2) An action injection module that enables frame-level mouse and keyboard inputs as interactive conditions; (3) A few-step distillation based on the casual architecture for real-time and streaming video generation. Matrix Game 2.0 can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS. We open-source our model weights and codebase to advance research in interactive world modeling.
Abstract（参考訳）: 近年のインタラクティブビデオ世代の発展は、複雑な物理力学とインタラクティブな振る舞いを捉えることで、拡散モデルが世界モデルとしての可能性を示している。しかし、既存の対話型世界モデルは双方向の注意と長い推論ステップに依存しており、リアルタイムのパフォーマンスを著しく制限している。その結果、過去の状況や現在の行動に基づいて結果が即時に更新されなければならない現実世界のダイナミクスをシミュレートするのは困難である。そこで本研究では,対話型ワールドモデルであるMatrix-Game 2.0を提案する。本フレームワークは,(1)Unreal EngineとGTA5環境のためのスケーラブルなデータ生成パイプラインにより,多種多様な対話アノテーションを用いたビデオデータの大量(約1200時間)を効果的に生成する,(2)インタラクティブな条件としてフレームレベルのマウスとキーボード入力を可能にするアクションインジェクションモジュール,(3)リアルタイムおよびストリーミングビデオ生成のためのカジュアルなアーキテクチャに基づく数ステップの蒸留である。 Matrix Game 2.0は、25FPSの超高速速度で、さまざまなシーンで高品質のミニレベルビデオを生成することができる。私たちはインタラクティブな世界モデリングの研究を進めるために、モデルの重みとコードベースをオープンソースにしています。

論文の概要: Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

関連論文リスト