Fugu-MT 論文翻訳(概要): SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

論文の概要: SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

arxiv url: http://arxiv.org/abs/2605.23345v1
Date: Fri, 22 May 2026 08:06:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.255133
Title: SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
Title（参考訳）: SCOPE:FPSワールドモデルのためのプレイ可能な環境におけるクロスゲーム操作のシミュレーション
Authors: Zizhao Tong, Hongfeng Lai, Zeqing Wang, Zhaohu Xing, Kexu Cheng, Haoran Xu, Zhao Pu, Shangwen Zhu, Ruili Feng, Jian Zhao, Yan Zhang, Hao Tang, Yeying Jin, Ling Shao,
Abstract要約: 既存のメソッドは、グローバルにアクションを注入し、シングルタイトルでトレーニングし、密度の高いFPS入力で失敗する。本稿では,事前学習したビデオ拡散モデルの各トランスブロックに条件付きモジュールを挿入するSCOPEを提案する。また,フレーム対応のアクションテレメトリを備えたマルチゲームFPSデータセットであるCrossFPSについても紹介する。
参考スコア（独自算出の注目度）: 49.15128236103093
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactive world models for first-person shooter (FPS) games must resolve high-frequency overlapping control signals at every frame without disrupting unaffected regions. Existing methods inject actions globally and train on single titles, failing under dense FPS inputs. We observe that FPS actions are spatially selective: discrete events such as firing or reloading affect only a localized region around the weapon (the scope), while continuous camera and movement signals govern stable surroundings. We propose SCOPE, which inserts a conditioning module into each transformer block of a pretrained video diffusion model. It reshapes features into per-pixel temporal sequences so that each position computes its action response from local visual content. This separates in-scope effects from out-of-scope generation without segmentation labels. We also introduce CrossFPS, the first multi-game FPS dataset with frame-aligned action telemetry. It comprises 69K clips from 7 titles with 10-DoF controller signals, curated to remove gameplay bias. The model learns general visual-to-action mappings rather than game-specific patterns, enabling zero-shot transfer to unseen scenes. Experiments confirm strong action responsiveness, precise scope separation, and effective cross-game generalization.
Abstract（参考訳）: ファーストパーソンシューティングゲーム(FPS)のインタラクティブワールドモデルは、影響を受けない領域を混乱させることなく、各フレームにおける高周波重なり合う制御信号を解決しなければならない。既存のメソッドは、グローバルにアクションを注入し、シングルタイトルでトレーニングし、密度の高いFPS入力で失敗する。射撃や再装填などの離散的な事象は、銃身周囲の局所的な領域(スコープ)にのみ影響し、連続カメラと移動信号は安定した環境を制御している。本稿では,事前学習したビデオ拡散モデルの各トランスブロックに条件付きモジュールを挿入するSCOPEを提案する。特徴をピクセルごとの時間シーケンスに再設定し、各位置が局所的な視覚コンテンツからアクション応答を計算する。これは、スコープ内効果と、セグメンテーションラベルなしでのスコープ外効果を分離する。また,フレーム対応のアクションテレメトリを備えたマルチゲームFPSデータセットであるCrossFPSについても紹介する。 7つのゲームから69Kのクリップと10-DoFのコントローラー信号があり、ゲームプレイのバイアスを取り除くためにキュレートされている。モデルはゲーム固有のパターンではなく、一般的な視覚とアクションのマッピングを学習し、目に見えないシーンへのゼロショット転送を可能にする。実験により、強力なアクション応答性、正確なスコープ分離、効果的なクロスゲーム一般化が確認された。

論文の概要: SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

関連論文リスト