Fugu-MT 論文翻訳(概要): Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

論文の概要: Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

arxiv url: http://arxiv.org/abs/2601.12993v1
Date: Mon, 19 Jan 2026 12:20:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.841838
Title: Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization
Title（参考訳）: being-H0.5:クロス・エボディメント・ジェネレーションのための人間中心型ロボット学習
Authors: Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu,
Abstract要約: 本稿では,多様なロボットプラットフォームにまたがる堅牢なクロスエボディメント一般化を目的とした,基礎的なビジョン・ランゲージ・アクションモデルであるBeing-H0.5を紹介する。今回紹介するUniHand-2.0は、これまでで最大で、3万5000時間以上のマルチモーダルデータを、30の異なるロボットエボディメントに展開する。 Be-H0.5 は LIBERO (98.9%) や RoboCasa (53.9%) のようなシミュレートされたベンチマークで最先端の結果を得る
参考スコア（独自算出の注目度）: 38.20385682344082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution. Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts. Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles. We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.
Abstract（参考訳）: 本稿では,多様なロボットプラットフォームにまたがる堅牢なクロスボデーメント一般化を目的とした,基礎的なビジョン・ランゲージ・アクション(VLA)モデルであるBeing-H0.5を紹介する。既存のVLAは、しばしば形態的不均一性とデータ不足に悩まされるが、人間の相互作用トレースを物理的相互作用のための普遍的な「母国語」として扱う、人間中心学習パラダイムを提案する。そこで本研究では,これまでで最大規模のトレーニング前レシピであるUniHand-2.0について紹介する。我々のアプローチでは、異種ロボット制御を意味的に整合したスロットにマッピングするUnified Action Spaceを導入し、低リソースロボットが人間のデータや高リソースプラットフォームからスキルをブートストラップできるようにする。この人間中心の基盤の上に構築された我々は、人間のデモンストレーションとロボット実行を橋渡しする、統合されたシーケンシャルモデリングとマルチタスク事前訓練パラダイムを設計する。アーキテクチャ上、Being-H0.5は、Mixture-of-Flow (MoF) フレームワークを特徴とするMixture-of-Transformers設計を採用しており、共有モータープリミティブを特殊なエンボディメントの専門家から切り離している。最後に、実世界でのクロス・エボディメントポリシーを安定させるために、センサシフト下での堅牢性のためのManifold-Preserving GatingとUniversal Async Chunkingを導入し、異なるレイテンシと制御プロファイルを持つエボディメント間のチャンク制御を普遍化する。我々は, LIBERO (98.9%) やRoboCasa (53.9%) などの模擬ベンチマークでBeing-H0.5が最先端の成果を得られたことを実証的に示すとともに, 5つのロボットプラットフォーム上で強力なクロスボデーメント能力を示した。

論文の概要: Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

関連論文リスト