Fugu-MT 論文翻訳(概要): Foundational World Models Accurately Detect Bimanual Manipulator Failures

論文の概要: Foundational World Models Accurately Detect Bimanual Manipulator Failures

arxiv url: http://arxiv.org/abs/2603.06987v1
Date: Sat, 07 Mar 2026 02:11:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.584648
Title: Foundational World Models Accurately Detect Bimanual Manipulator Failures
Title（参考訳）: 人工マニピュレータの故障を正確に検出する基礎的世界モデル
Authors: Isaac R. Ward, Michelle Ho, Houjun Liu, Aaron Feldman, Joseph Vincent, Liam Kruse, Sean Cheong, Duncan Eddy, Mykel J. Kochenderfer, Mac Schwager,
Abstract要約: 我々は、事前学習された視覚基盤モデルの圧縮潜在空間内で、確率的、歴史的、世界モデルを訓練する。このモデルは、整合予測フレームワーク内の非整合性スコアとして機能する予測と共に不確実性推定を出力する。また,本手法の学習手法として,トレーニング可能なパラメータの約20分の1が必須であることを示す。
参考スコア（独自算出の注目度）: 21.93685012734004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying visuomotor robots at scale is challenging due to the potential for anomalous failures to degrade performance, cause damage, or endanger human life. Bimanual manipulators are no exception; these robots have vast state spaces comprised of high-dimensional images and proprioceptive signals. Explicitly defining failure modes within such state spaces is infeasible. In this work, we overcome these challenges by training a probabilistic, history informed, world model within the compressed latent space of a pretrained vision foundation model (NVIDIA's Cosmos Tokenizer). The model outputs uncertainty estimates alongside its predictions that serve as non-conformity scores within a conformal prediction framework. We use these scores to develop a runtime monitor, correlating periods of high uncertainty with anomalous failures. To test these methods, we use the simulated Push-T environment and the Bimanual Cable Manipulation dataset, the latter of which we introduce in this work. This new dataset features trajectories with multiple synchronized camera views, proprioceptive signals, and annotated failures from a challenging data center maintenance task. We benchmark our methods against baselines from the anomaly detection and out-of-distribution detection literature, and show that our approach considerably outperforms statistical techniques. Furthermore, we show that our approach requires approximately one twentieth of the trainable parameters as the next-best learning-based approach, yet outperforms it by 3.8% in terms of failure detection rate, paving the way toward safely deploying manipulator robots in real-world environments where reliability is non-negotiable.
Abstract（参考訳）: 視覚運動ロボットを大規模に展開することは、異常な失敗がパフォーマンスを低下させたり、損傷を与えたり、人間の生命を危険にさらす可能性があるため、難しい。双対マニピュレータは例外ではなく、これらのロボットは高次元画像と受容シグナルからなる広大な状態空間を持っている。このような状態空間内で障害モードを明示的に定義することは不可能である。本研究では,プレトレーニングされたビジョンファウンデーションモデル(NVIDIAのコスモス・トケナイザー)の圧縮潜在空間内で,確率的,歴史的,世界モデルをトレーニングすることで,これらの課題を克服する。このモデルは、整合予測フレームワーク内の非整合性スコアとして機能する予測と共に不確実性推定を出力する。これらのスコアをランタイムモニタの開発に使用し、異常な障害と高い不確実性の期間を関連づける。これらの手法をテストするために、シミュレーションされたPush-T環境と、本研究で紹介したBimanual Cable Manipulationデータセットを使用する。この新しいデータセットには、複数の同期カメラビュー、プロプリセプティブ信号、挑戦的なデータセンタメンテナンスタスクからの注釈付き障害を備えたトラジェクトリが含まれている。本手法は,異常検出とアウト・オブ・ディストリビューション検出の文献からのベースラインに対するベンチマークを行い,統計的手法よりもかなり優れていることを示す。さらに,本手法では,訓練可能なパラメータの約20分の1を次のベテラン学習ベースアプローチとして必要としているが,故障検出率では3.8%向上し,信頼性の低い実環境においてマニピュレータロボットを安全に配置する方法について検討した。

論文の概要: Foundational World Models Accurately Detect Bimanual Manipulator Failures

関連論文リスト