Fugu-MT 論文翻訳(概要): EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

論文の概要: EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

arxiv url: http://arxiv.org/abs/2602.16710v1
Date: Wed, 18 Feb 2026 18:59:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.548528
Title: EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
Title（参考訳）: EgoScale: 異種エゴセントリックな人的データによるデキステラスマニピュレーションのスケーリング
Authors: Ruijie Zheng, Dantong Niu, Yuqi Xie, Jing Wang, Mengda Xu, Yunfan Jiang, Fernando Castañeda, Fengyuan Hu, You Liang Tan, Letian Fu, Trevor Darrell, Furong Huang, Yuke Zhu, Danfei Xu, Linxi Fan,
Abstract要約: EgoScaleは、大規模な自我中心の人間データ上に構築された人から器用な操作伝達フレームワークである。簡単な2段階のトランスファーレシピを導入し, 大規模人体事前訓練と, ライトウェイトアライメントされた人間ロボットのトレーニングを行った。最終方針は、22個のDoFデキスタラスロボットハンドを使用して、トレーニング済みのベースラインに対して平均成功率を54%向上させる。
参考スコア（独自算出の注目度）: 114.89243396877453
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human behavior is among the most scalable sources of data for learning physical intelligence, yet how to effectively leverage it for dexterous manipulation remains unclear. While prior work demonstrates human to robot transfer in constrained settings, it is unclear whether large scale human data can support fine grained, high degree of freedom dexterous manipulation. We present EgoScale, a human to dexterous manipulation transfer framework built on large scale egocentric human data. We train a Vision Language Action (VLA) model on over 20,854 hours of action labeled egocentric human video, more than 20 times larger than prior efforts, and uncover a log linear scaling law between human data scale and validation loss. This validation loss strongly correlates with downstream real robot performance, establishing large scale human data as a predictable supervision source. Beyond scale, we introduce a simple two stage transfer recipe: large scale human pretraining followed by lightweight aligned human robot mid training. This enables strong long horizon dexterous manipulation and one shot task adaptation with minimal robot supervision. Our final policy improves average success rate by 54% over a no pretraining baseline using a 22 DoF dexterous robotic hand, and transfers effectively to robots with lower DoF hands, indicating that large scale human motion provides a reusable, embodiment agnostic motor prior.
Abstract（参考訳）: 人間の行動は、物理的な知性を学ぶための最もスケーラブルなデータ源の1つだが、それを巧妙な操作のために効果的に活用する方法は、まだ不明である。従来の研究は、制約された環境で人間からロボットへの移動を実証しているが、大規模な人間のデータがきめ細かな、高度な自由な操作をサポートできるかどうかは不明だ。 EgoScaleは、大規模な自我中心の人間データ上に構築された人から器用な操作伝達フレームワークである。我々は、20,854時間以上の人間中心型ビデオにラベル付けされたアクションでVLA(Vision Language Action)モデルをトレーニングし、以前の取り組みの20倍以上の規模で、人間のデータスケールとバリデーション損失の間のログ線形スケーリング法則を明らかにする。この検証損失は、下流のリアルロボットのパフォーマンスと強く相関し、予測可能な監視源として大規模な人的データを確立する。大規模人体前訓練と軽量人体ロボットのトレーニングという2段階の簡単なトランスファーレシピを導入する。これにより、強力な長い水平方向操作と、最小限のロボット監督によるワンショットタスク適応が可能となる。我々の最終方針は、22個のDoFデキスタラスロボットハンドを使用して、トレーニング済みのベースラインに対して平均成功率を54%向上させ、より低いDoFハンドを持つロボットに効果的に転送することで、大規模な人体運動が、再利用可能なエンボディメント非依存モーターを前もって提供することを示す。

論文の概要: EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

関連論文リスト