Fugu-MT 論文翻訳(概要): ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

論文の概要: ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

arxiv url: http://arxiv.org/abs/2603.09170v1
Date: Tue, 10 Mar 2026 04:19:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.025339
Title: ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video
Title（参考訳）: ZeroWBC:人間中心ビデオから直接自然視運動型ヒューマノイド制御を学習する
Authors: Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong Li,
Abstract要約: 我々は、人間中心のビデオから直接、自然なヒューマノイドビジュモータ制御ポリシーを学ぶ新しいフレームワークであるZeroWBCを紹介した。提案手法はまず視覚言語モデル(VLM)を微調整し,テキスト命令とエゴセントリックな視覚コンテキストに基づく将来の身体全体の動作を予測する。ユニツリーG1ヒューマノイドロボットの実験では,動作の自然性と汎用性において,本手法がベースラインアプローチより優れていることが示された。
参考スコア（独自算出の注目度）: 52.78703020909145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving versatile and naturalistic whole-body control for humanoid robot scene-interaction remains a significant challenge. While some recent works have demonstrated autonomous humanoid interactive control, they are constrained to rigid locomotion patterns and expensive teleoperation data collection, lacking the versatility to execute more human-like natural behaviors such as sitting or kicking. Furthermore, acquiring the necessary real robot teleoperation data is prohibitively expensive and time-consuming. To address these limitations, we introduce ZeroWBC, a novel framework that learns a natural humanoid visuomotor control policy directly from human egocentric videos, eliminating the need for large-scale robot teleoperation data and enabling natural humanoid robot scene-interaction control. Specifically, our approach first fine-tunes a Vision-Language Model (VLM) to predict future whole-body human motions based on text instructions and egocentric visual context, then these generated motions are retargeted to real robot joints and executed via our robust general motion tracking policy for humanoid whole-body control. Extensive experiments on the Unitree G1 humanoid robot demonstrate that our method outperforms baseline approaches in motion naturalness and versatility, successfully establishing a pipeline that eliminates teleoperation data collection overhead for whole-body humanoid control, offering a scalable and efficient paradigm for general humanoid whole-body control.
Abstract（参考訳）: ヒューマノイドロボットのシーンインタラクションのための多目的で自然主義的な全身制御を実現することは、依然として大きな課題である。近年のいくつかの研究は、自律的なヒューマノイドのインタラクティブな制御を実証しているが、それらは厳格なロコモーションパターンと高価な遠隔操作データ収集に制約されており、座ったり蹴ったりといった人間のような自然な行動を実行するための汎用性が欠如している。さらに、必要な実際のロボット遠隔操作データを取得することは、極めて高価で時間を要する。これらの制約に対処するため、ZeroWBCは人間中心のビデオから直接人間型ロボットの視覚運動制御ポリシーを学習し、大規模なロボット遠隔操作データの必要性を排除し、自然型ヒューマノイドロボットのシーン操作制御を可能にする新しいフレームワークである。具体的には、まず、テキスト命令と自我中心の視覚的コンテキストに基づいて、将来の全身動作を予測するために視覚言語モデル(VLM)を微調整し、これらの動きを実際のロボット関節に再ターゲティングし、ヒューマノイド全身制御のための頑健な一般的なモーショントラッキングポリシーを介して実行する。このUnitree G1ヒューマノイドロボットの広汎な実験により,本手法は動作の自然性と汎用性においてベースラインアプローチよりも優れており,全身ヒューマノイド制御のための遠隔操作データ収集オーバーヘッドを排除し,汎用ヒューマノイド全体制御のためのスケーラブルで効率的なパラダイムを提供するパイプラインの確立に成功している。

論文の概要: ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

関連論文リスト