Fugu-MT 論文翻訳(概要): MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

論文の概要: MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

arxiv url: http://arxiv.org/abs/2605.29360v1
Date: Thu, 28 May 2026 04:58:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:55.746259
Title: MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models
Title（参考訳）: MiraBench: ロボットの世界モデルにおけるアクション・コンディションの信頼性の評価
Authors: Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang, Jiayi Zhou, Jiaming Ji, Juntao Dai, Jiawei Chen, Boyuan Chen, Yaodong Yang,
Abstract要約: 我々は,ロボット世界モデルのコア評価対象として,強調条件付き信頼性を定義する階層型ベンチマークであるtextscMiraBenchを紹介する。この評価を支援するために,タスク,障害カテゴリ,先進世界モデルにまたがる16,000以上の判断で,人手によるコーパスをキュレートする。視覚的忠実度はアクション忠実性の指標として不十分なこと、モデルスケールの増大はアクションのフォローを確実に改善しないこと、最適化バイアスが現在のシステム全体に広まること、である。
参考スコア（独自算出の注目度）: 25.87580992111249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Action-conditioned world models are increasingly used as scalable simulators for robot learning, yet current evaluations provide limited evidence that their predictions are reliable under the actions they condition on. Existing benchmarks largely emphasize visual fidelity, leaving unclear whether predicted futures are physically plausible, faithful to commanded actions, and calibrated to failure when actions should not succeed. We introduce \textsc{MiraBench}, a hierarchical benchmark that defines \emph{action-conditioned reliability} as a core evaluation target for robotic world models. MiraBench decomposes this target into three progressively demanding levels: \emph{Physics Adherence}, which evaluates reference-free physical consistency; \emph{Action-Following Fidelity}, which measures whether predictions respect task-relevant action inputs; and \emph{Optimism Bias Detection}, which probes the tendency to predict successful outcomes under failure-inducing actions. To support this evaluation, we curate a human-annotated corpus with over 16,000 judgments across tasks, failure categories, and leading world models. We evaluate 12 representative model configurations spanning vector-conditioned robotic world models, text-conditioned generative world models, open-weight systems, closed-source systems, and multiple model scales. Across this broad model landscape, MiraBench reveals three central findings: visual fidelity is a poor proxy for action fidelity; increasing model scale does not reliably improve action following; and optimism bias is pervasive across current systems. By shifting evaluation from appearance to action-conditioned reliability, MiraBench provides a diagnostic foundation for assessing and improving robotic world models as faithful simulators.
Abstract（参考訳）: アクション条件付き世界モデルは、ロボット学習のためのスケーラブルなシミュレータとしてますます利用されているが、現在の評価では、ロボットが条件を定めているアクションの下では、その予測が信頼できるという証拠が限られている。既存のベンチマークは主に視覚的忠実さを強調しており、予測される未来が物理的に妥当であるかどうか、命令された行動に忠実であり、アクションが成功しない場合の失敗を校正している。本稿では,ロボット世界モデルのコア評価対象として,‘emph{action-conditioned reliability} を定義する階層型ベンチマークである‘textsc{MiraBench} を紹介する。 MiraBench氏は、この目標を、基準のない物理的整合性を評価する \emph{Physics Adherence} 、タスク関連アクションの入力を尊重するかどうかを測定する \emph{Action-Following Fidelity} 、失敗誘発アクションによる結果を予測する傾向を調査する \emph{Optimism Bias Detection の3つの段階に分解する。この評価を支援するために,タスク,障害カテゴリ,先進世界モデルにまたがる16,000以上の判断で,人手によるコーパスをキュレートする。ベクトル条件付きロボット世界モデル,テキスト条件付き生成世界モデル,オープンウェイトシステム,クローズドソースシステム,複数モデルスケールにまたがる12の代表的なモデル構成を評価する。視覚的忠実度はアクション忠実性の指標として不十分なこと、モデルスケールの増大はアクションのフォローを確実に改善しないこと、最適化バイアスが現在のシステム全体に広まること、である。 MiraBenchは外観から行動条件の信頼性へ評価をシフトすることで、忠実なシミュレータとしてロボットの世界モデルを評価し改善するための診断基盤を提供する。

論文の概要: MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

関連論文リスト