Fugu-MT 論文翻訳(概要): HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

論文の概要: HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

arxiv url: http://arxiv.org/abs/2508.16943v1
Date: Sat, 23 Aug 2025 08:23:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.267509
Title: HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement
Title（参考訳）: HumanoidVerse:視覚言語ガイドによる多目的再構成のための多機能ヒューマノイド
Authors: Haozhuo Zhang, Jingkai Sun, Michele Caprio, Jian Tang, Shanghang Zhang, Qiang Zhang, Wei Pan,
Abstract要約: 視覚言語誘導型ヒューマノイド制御のための新しいフレームワークであるHumanoidVerseを紹介する。 HumanoidVerseは、自然言語命令と自我中心のカメラRGB観測のみでガイドされる複数のオブジェクトの連続的な操作をサポートする。我々の研究は、現実の知覚的制約の下で複雑なシーケンシャルなタスクを実行できる、堅牢で汎用的なヒューマノイドエージェントに向けた重要なステップである。
参考スコア（独自算出の注目度）: 51.16740261131198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce HumanoidVerse, a novel framework for vision-language guided humanoid control that enables a single physically simulated robot to perform long-horizon, multi-object rearrangement tasks across diverse scenes. Unlike prior methods that operate in fixed settings with single-object interactions, our approach supports consecutive manipulation of multiple objects, guided only by natural language instructions and egocentric camera RGB observations. HumanoidVerse is trained via a multi-stage curriculum using a dual-teacher distillation pipeline, enabling fluid transitions between sub-tasks without requiring environment resets. To support this, we construct a large-scale dataset comprising 350 multi-object tasks spanning four room layouts. Extensive experiments in the Isaac Gym simulator demonstrate that our method significantly outperforms prior state-of-the-art in both task success rate and spatial precision, and generalizes well to unseen environments and instructions. Our work represents a key step toward robust, general-purpose humanoid agents capable of executing complex, sequential tasks under real-world sensory constraints. The video visualization results can be found on the project page: https://haozhuo-zhang.github.io/HumanoidVerse-project-page/.
Abstract（参考訳）: 我々は,視覚言語誘導型ヒューマノイド制御のための新しいフレームワークであるHumanoidVerseを紹介した。単一物体の相互作用を伴う固定設定で動作する従来の手法とは異なり、本手法は自然言語命令と自我中心カメラRGB観測のみで誘導される複数の物体の連続的な操作をサポートする。 HumanoidVerseは、環境リセットを必要とせずにサブタスク間の流動的な遷移を可能にするデュアルティーラー蒸留パイプラインを使用して、多段階のカリキュラムを通じて訓練されている。これを支援するために、4つの部屋配置にまたがる350の多目的タスクからなる大規模データセットを構築した。アイザック・ギムシミュレーターにおける大規模な実験により,タスク成功率と空間精度の両方において,我々の手法が先行技術よりも著しく優れており,目に見えない環境や指示によく当てはまることを示した。我々の研究は、現実の知覚的制約の下で複雑なシーケンシャルなタスクを実行できる、堅牢で汎用的なヒューマノイドエージェントに向けた重要なステップである。ビデオビジュアライゼーションの結果は、プロジェクトのページ(https://haozhuo-zhang.github.io/HumanoidVerse-project-page/)で見ることができる。

論文の概要: HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

関連論文リスト