Fugu-MT 論文翻訳(概要): MobileManiBench: Simplifying Model Verification for Mobile Manipulation

論文の概要: MobileManiBench: Simplifying Model Verification for Mobile Manipulation

arxiv url: http://arxiv.org/abs/2602.05233v1
Date: Thu, 05 Feb 2026 02:49:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.189853
Title: MobileManiBench: Simplifying Model Verification for Mobile Manipulation
Title（参考訳）: MobileManiBench: モバイル操作のためのモデル検証の簡略化
Authors: Wenbo Wang, Fangyun Wei, QiXiu Li, Xi Chen, Yaobo Liang, Chang Xu, Jiaolong Yang, Baining Guo,
Abstract要約: MobileManiBenchは、モバイルベースのロボット操作のための大規模なベンチマークである。 MobileManiBenchには、2つのモバイルプラットフォーム(パラレルグリッパーとデキソラスハンドロボット)、2つの同期カメラ(頭と右手首)、630のオブジェクト(オープン、クローズ、プル、プッシュ、ピック)、5つのスキル(オープン、クローズ、プッシュ、ピック)、100以上のタスクが現実的なシーンで実行される。
参考スコア（独自算出の注目度）: 70.30578259859512
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language-action models have advanced robotic manipulation but remain constrained by reliance on the large, teleoperation-collected datasets dominated by the static, tabletop scenes. We propose a simulation-first framework to verify VLA architectures before real-world deployment and introduce MobileManiBench, a large-scale benchmark for mobile-based robotic manipulation. Built on NVIDIA Isaac Sim and powered by reinforcement learning, our pipeline autonomously generates diverse manipulation trajectories with rich annotations (language instructions, multi-view RGB-depth-segmentation images, synchronized object/robot states and actions). MobileManiBench features 2 mobile platforms (parallel-gripper and dexterous-hand robots), 2 synchronized cameras (head and right wrist), 630 objects in 20 categories, 5 skills (open, close, pull, push, pick) with over 100 tasks performed in 100 realistic scenes, yielding 300K trajectories. This design enables controlled, scalable studies of robot embodiments, sensing modalities, and policy architectures, accelerating research on data efficiency and generalization. We benchmark representative VLA models and report insights into perception, reasoning, and control in complex simulated environments.
Abstract（参考訳）: 視覚言語アクションモデルには高度なロボット操作があるが、静的なテーブルトップシーンが支配する大規模な遠隔操作によるデータセットに依存している。実世界の展開前にVLAアーキテクチャを検証するためのシミュレーションファーストフレームワークを提案し,モバイルベースのロボット操作のための大規模ベンチマークであるMobileManiBenchを紹介した。 NVIDIA Isaac Simをベースに構築され、強化学習によって、私たちのパイプラインは、リッチアノテーション(言語命令、多視点RGB-deepth-segmentationイメージ、同期オブジェクト/ロボット状態、アクション)による多様な操作トラジェクトリを自律的に生成します。 MobileManiBenchには、2つのモバイルプラットフォーム(パラレルグリッパーとデキソラスハンドロボット)、2つの同期カメラ(頭と右手首)、630のオブジェクト(オープン、クローズ、プル、プッシュ、ピック)、5つのスキル(オープン、クローズ、プッシュ、ピック)、100以上のタスクが現実的なシーンで実行される。この設計により、制御されたスケーラブルなロボットエボディメントの研究、モダリティのセンシング、ポリシーアーキテクチャ、データ効率と一般化の研究を加速することができる。我々は,VLAモデルをベンチマークし,複雑なシミュレーション環境での知覚,推論,制御に関する知見を報告する。

論文の概要: MobileManiBench: Simplifying Model Verification for Mobile Manipulation

関連論文リスト