Fugu-MT 論文翻訳(概要): SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

論文の概要: SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

arxiv url: http://arxiv.org/abs/2605.20373v1
Date: Tue, 19 May 2026 18:24:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.319254
Title: SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
Title（参考訳）: SUGAR: スケーラブルなヒューマンビデオ駆動型汎用ヒューマノイドロコマニピュレーション学習フレームワーク
Authors: Tianshu Wu, Xiangqi Kong, Yue Chen, Qize Yu, Hang Ye, Jia Li, Yizhou Wang, Hao Dong,
Abstract要約: 現実の世界における全身のロボット操作を一般化できるヒューマノイドロボットの構築は、依然として根本的な課題である。 SuGARは、多様な人間の動画をデプロイ可能なヒューマノイドのロコ操作スキルに変換するスケーラブルなデータ駆動フレームワークである。シミュレーションと実世界のヒューマノイドハードウェアにおいて,SUGARを6つの代表的ロコ操作タスクで評価する。
参考スコア（独自算出の注目度）: 27.050974194855964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building humanoid robots capable of generalizable whole-body loco-manipulation in the real world remains a fundamental challenge. Existing methods either rely on laborious task-specific reward engineering, rigidly replay reference motions that fail to generalize, or depend on costly teleoperation that limits scalability. While human videos capture diverse human behaviors, motion priors inferred from them are inherently imperfect, suffering from occlusion, contact artifacts, and retargeting errors that render them unsuitable for direct policy learning. To address this, we present SUGAR, a scalable data-driven framework that converts diverse human videos into deployable humanoid loco-manipulation skills, without any task-specific reward engineering or reference-motion conditioning at inference. SUGAR proceeds in three stages. First, a fully automated pipeline extracts kinematic interaction priors including human-object motion trajectories and contact labels from unstructured human videos. Second, a privileged physics-based refiner uses a unified mimic reward and progressive state pool to transform imperfect priors into physically feasible, high-fidelity skills. Third, refined skills are distilled into a hierarchical autonomous policy consisting of a command generator and a command tracker. We evaluate SUGAR on six representative loco-manipulation tasks in simulation and real-world humanoid hardware. Our method substantially outperforms reference-tracking baselines, and performance scales clearly with the amount of human video data. It also achieves zero-shot real-world transfer with reliable closed-loop execution, autonomous failure recovery, and stable long-horizon performance under external perturbations. Project Page: https://tianshuwu.github.io/sugar-humanoid/
Abstract（参考訳）: 現実の世界における全身のロボット操作を一般化できるヒューマノイドロボットの構築は、依然として根本的な課題である。既存の手法は、厳格なタスク固有の報酬工学に依存し、一般化に失敗した参照動作を厳格にリプレイするか、スケーラビリティを制限するコストのかかる遠隔操作に依存している。人間のビデオは多様な人間の行動を捉えているが、それらから推測される動きの先入観は本質的に不完全であり、閉塞、接触した人工物、そして直接の政策学習には適さないエラーの再ターゲティングに苦しむ。これを解決するために、SUGARはスケーラブルなデータ駆動型フレームワークで、多様な人間の動画を、タスク固有の報酬工学や推論時の参照動作条件を使わずに、展開可能なヒューマノイドのロコ操作スキルに変換する。 SUGARは3段階で進行する。まず、完全に自動化されたパイプラインは、人間の物体の動き軌跡や、非構造的な人間のビデオから接触ラベルを含むキネティックな相互作用を抽出する。第二に、特権を持つ物理学ベースの精錬機は、統一された模倣報酬とプログレッシブステートプールを使用して、不完全な先行を物理的に実現可能な高忠実なスキルに変換する。第3に、洗練された技術は、コマンドジェネレータとコマンドトラッカーからなる階層的な自律ポリシーに蒸留される。シミュレーションと実世界のヒューマノイドハードウェアにおいて,SUGARを6つの代表的ロコ操作タスクで評価する。提案手法は基準追従ベースラインを大幅に上回り,人間の映像データ量で明らかにスケールする。また、信頼性の高いクローズドループ実行、自律的障害回復、および外部の摂動下での安定したロングホライゾン性能により、ゼロショットの現実世界転送も実現している。 Project Page: https://tianshuwu.github.io/sugar- Humanoid/

論文の概要: SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

関連論文リスト