Fugu-MT 論文翻訳(概要): Learning Versatile Humanoid Manipulation with Touch Dreaming

論文の概要: Learning Versatile Humanoid Manipulation with Touch Dreaming

arxiv url: http://arxiv.org/abs/2604.13015v1
Date: Tue, 14 Apr 2026 17:54:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.596195
Title: Learning Versatile Humanoid Manipulation with Touch Dreaming
Title（参考訳）: タッチドリームによる触覚操作の学習
Authors: Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao,
Abstract要約: 我々は, 接触に富むヒト型ロコ操作について検討した。まず, 安定な下半身および胴体実行を実現するRLベースの全体制御器を開発した。次に,マルチモーダルエンコーダ-デコーダ変換器であるHTD(Humanoid Transformer with Touch Dreaming)を提案する。
参考スコア（独自算出の注目度）: 33.65998002598862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world. Project webpage: humanoid-touch-dream.github.io.
Abstract（参考訳）: ヒューマノイドロボットは汎用的な支援を約束するが、ボディ全体の安定性、器用な手、接触認識を必要とするため、現実のヒューマノイドのロボ操作は依然として困難である。本研究では, 接触に富むヒト型ロボットの創製について検討した。まず, 複雑な操作を行う際に, 安定な下半身および胴体実行を実現するRLベースの全体制御器を開発する。このコントローラをベースとして,VRによる遠隔操作と人間間モーションマッピングを組み合わせることで,実世界のデモの効率的な収集を可能にする,全身型ヒューマノイドデータ収集システムを開発した。次に,マルチビュービジョンとプロプリセプションとともにタッチをコアモダリティとしてモデル化するマルチモーダルエンコーダデコーダトランスである,タッチドリーム付きヒューマノイドトランス (HTD) を提案する。アクションチャンクの予測に加えて、このポリシーは将来のハンドジョイント力と将来の触覚潜伏剤を予測し、共有トランスフォーマートランクにデキスタスインタラクションのための接触認識表現を学習するよう促す。接触の多い5つのタスク、Insert-T、Book Organization、Towel Folding、Cat Litter Scooping、Tea Servingにおいて、HTDはより強力なベースラインよりも平均的な成功率を90.9%向上させる。アブレーションの結果,潜伏空間の触覚予測は生の触覚予測よりも有効であり,30%の相対的な成功率が得られることがわかった。これらの結果は、堅牢な全体実行、スケーラブルなヒューマノイドデータ収集、および予測タッチ中心学習を組み合わせることで、現実世界で多目的で高デキスタリティなヒューマノイド操作が可能になることを実証している。プロジェクトWebページ: Humanoid-touch-dream.github.io

論文の概要: Learning Versatile Humanoid Manipulation with Touch Dreaming

関連論文リスト