Fugu-MT 論文翻訳(概要): OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics

論文の概要: OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics

arxiv url: http://arxiv.org/abs/2606.04463v2
Date: Thu, 04 Jun 2026 13:11:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 19:21:33.300666
Title: OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics
Title（参考訳）: OSCAR:Omni-Embodiment Action-Conditioned World Model for Robotics
Authors: Zhuoyuan Wu, Jun Gao,
Abstract要約: 本稿では,ロボットの動作を多岐にわたって一般化し,ロボットのポリシー評価を可能にする,精密なアクション条件付きビデオワールドモデルを提案する。本モデルは,既存のベースラインと比較して,動作の追従,外観品質,動作の整合性を大幅に向上させる。さらにOSCARをデプロイして,RoboArenaからロボットポリシーを評価する。
参考スコア（独自算出の注目度）: 6.134835623350618
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present OSCAR, a precise action-conditioned video world model that generalizes across different robot embodiments and enables robot policy evaluation. Existing video world models face three main challenges for real-world robot evaluation: limited scenario diversity in current robot training datasets, imprecise action following, and poor generalization across embodiments for broad adoption. We tackle these challenges from two perspectives. At its core is a large-scale standardized data pipeline that curates, filters, and deduplicates broad robotics and egocentric human datasets, yielding a clean joint-training dataset that spans diverse tasks, scenarios, actions, and robot embodiments. To condition the video model, we adopt 2D kinematic skeleton rendering as a unified conditioning representation that generalizes across different robot arms or even human hands. We finetune the Cosmos-Predict2.5-2B model on a single GH200 GPU. Our model achieves significant improvement on action following, appearance quality, and motion consistency, compared to existing baselines, which either have a much larger model size or require more GPUs. We further deploy OSCAR to evaluate robot policies from RoboArena. Extensive experiments demonstrate the significant correlation between our virtual policy evaluation in OSCAR and real-world evaluation, paving the way for the future where robot policies can be purely evaluated in virtual generated worlds.
Abstract（参考訳）: 我々は,様々なロボットの動作を一般化し,ロボットのポリシー評価を可能にする,精密なアクション条件付きビデオワールドモデルOSCARを提案する。既存のビデオワールドモデルは、現在のロボットトレーニングデータセットにおける限られたシナリオの多様性、従う不正確なアクション、広く採用されるための実施形態の一般化の3つの主な課題に直面している。私たちは2つの視点からこれらの課題に取り組みます。コアとなるのは大規模な標準化されたデータパイプラインで、幅広いロボティクスとエゴセントリックな人間のデータセットをキュレートし、フィルタし、分離し、さまざまなタスク、シナリオ、アクション、ロボットの体格にまたがるクリーンな共同トレーニングデータセットを生成する。ビデオモデルの条件付けには、異なるロボットアームや人間の手に渡って一般化する統一条件表現として、2Dキネマティックスケルトンレンダリングを採用する。単一のGH200 GPU上でCosmos-Predict2.5-2Bモデルを微調整する。我々のモデルは、モデルサイズがはるかに大きいか、より多くのGPUを必要とする既存のベースラインと比較して、アクション追従、外観品質、動きの整合性を大幅に改善する。さらにOSCARをデプロイして,RoboArenaからロボットポリシーを評価する。大規模な実験により,OSCARにおける仮想ポリシー評価と実世界の評価との間に有意な相関が示され,ロボットポリシーが仮想的に生成された世界で純粋に評価できる未来への道が開かれた。

論文の概要: OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics

関連論文リスト