Fugu-MT 論文翻訳(概要): HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

論文の概要: HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

arxiv url: http://arxiv.org/abs/2604.07993v1
Date: Thu, 09 Apr 2026 09:01:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.818476
Title: HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation
Title（参考訳）: HEX: クロス・エボディメント(全体操)のヒューマノイド専門家
Authors: Shuanghao Bai, Meng Li, Xinyuan Lv, Jiawei Wang, Xinhua Wang, Fei Liao, Chengkai Hou, Langzhe Gu, Wanqi Zhou, Kun Wu, Ziluo Ding, Zhiyuan Xu, Lei Sun, Shanghang Zhang, Zhengping Che, Jian Tang, Badong Chen,
Abstract要約: HEXは、ヒューマノイドロボットの協調操作のための状態中心のフレームワークである。ヘテロジニアスな実施形態をまたいだスケーラブルな学習のための、ヒューマノイドに整合した普遍的状態表現が組み込まれている。タスクの成功率と一般化における最先端のパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 74.34984994596813
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and incorporates a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. To efficiently capture temporal visual context, HEX uses lightweight history tokens to summarize past observations, avoiding repeated encoding of historical images during inference. It further employs a residual-gated fusion mechanism with a flow-matching action head to adaptively integrate visual-language cues with proprioceptive dynamics for action generation. Experiments on real-world humanoid manipulation tasks show that HEX achieves state-of-the-art performance in task success rate and generalization, particularly in fast-reaction and long-horizon scenarios.
Abstract（参考訳）: 人間は、調整された全身制御によって複雑な操作を行うのに対し、ほとんどのビジョン・ランゲージ・アクション(VLA)モデルは、ロボットの身体部分を主に独立して扱うため、ハイDoFのヒューマノイド制御は困難であり、しばしば不安定である。フルサイズの2足歩行ロボットにおける協調操作のための状態中心型フレームワークであるHEXを提案する。 HEXは、異種エボディメントをまたいだスケーラブルな学習のためのヒューマノイド整合型普遍状態表現を導入し、Mixture-of-Experts Unified Proprioceptive Predictorを組み込んで、大規模なマルチエンボディメント軌道データから全身調整と時間運動のダイナミクスをモデル化する。 HEXは、時間的視覚的コンテキストを効率的に捉えるために、過去の観測を要約するために軽量な履歴トークンを使用し、推論中に歴史的画像の繰り返し符号化を避ける。さらに、フローマッチングアクションヘッドを備えた残留ゲート融合機構を使用して、視覚言語キューとプロプレセプティブダイナミクスを適応的に統合してアクション生成する。実世界のヒューマノイド操作タスクの実験は、HEXがタスクの成功率と一般化における最先端のパフォーマンス、特に高速反応と長期水平シナリオにおいて達成していることを示している。

論文の概要: HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

関連論文リスト