Fugu-MT 論文翻訳(概要): UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

論文の概要: UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

arxiv url: http://arxiv.org/abs/2603.22264v1
Date: Mon, 23 Mar 2026 17:49:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.828141
Title: UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos
Title（参考訳）: UniDex: Egocentric Human Videosによるユニバーサルデクスタースハンドコントロールのためのロボットファウンデーションスイート
Authors: Gu Zhang, Qicheng Xu, Haozhe Zhang, Jianhan Ma, Long He, Yiming Bao, Zeyu Ping, Zhecheng Yuan, Chenhao Lu, Chengbo Yuan, Tianhai Liang, Xiaoyu Tian, Maanping Shao, Feihong Zhang, Mingyu Ding, Yang Gao, Hao Zhao, Hang Zhao, Huazhe Xu,
Abstract要約: 実際のロボット遠隔操作データの収集コストのため、デクサラスな操作は依然として困難である。我々は、ロボット中心の大規模データセットと視覚言語アクション(VLA)ポリシーを結合したロボット基盤スイートであるUniDexを紹介する。 UniDex-Dataset、UniDex-VLA、UniDex-Capは、ユニバーサルデキスタラス操作のためのスケーラブルな基盤スイートを提供する。
参考スコア（独自算出の注目度）: 65.2981273885678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dexterous manipulation remains challenging due to the cost of collecting real-robot teleoperation data, the heterogeneity of hand embodiments, and the high dimensionality of control. We present UniDex, a robot foundation suite that couples a large-scale robot-centric dataset with a unified vision-language-action (VLA) policy and a practical human-data capture setup for universal dexterous hand control. First, we construct UniDex-Dataset, a robot-centric dataset over 50K trajectories across eight dexterous hands (6--24 DoFs), derived from egocentric human video datasets. To transform human data into robot-executable trajectories, we employ a human-in-the-loop retargeting procedure to align fingertip trajectories while preserving plausible hand-object contacts, and we operate on explicit 3D pointclouds with human hands masked to narrow kinematic and visual gaps. Second, we introduce the Function-Actuator-Aligned Space (FAAS), a unified action space that maps functionally similar actuators to shared coordinates, enabling cross-hand transfer. Leveraging FAAS as the action parameterization, we train UniDex-VLA, a 3D VLA policy pretrained on UniDex-Dataset and finetuned with task demonstrations. In addition, we build UniDex-Cap, a simple portable capture setup that records synchronized RGB-D streams and human hand poses and converts them into robot-executable trajectories to enable human-robot data co-training that reduces reliance on costly robot demonstrations. On challenging tool-use tasks across two different hands, UniDex-VLA achieves 81% average task progress and outperforms prior VLA baselines by a large margin, while exhibiting strong spatial, object, and zero-shot cross-hand generalization. Together, UniDex-Dataset, UniDex-VLA, and UniDex-Cap provide a scalable foundation suite for universal dexterous manipulation.
Abstract（参考訳）: 実際のロボット遠隔操作データ収集のコスト、手片の異質性、制御の高次元性などにより、デクサラスな操作は依然として困難である。 We present a robot foundation suite, a large-scale robot-centric dataset with a unified vision-lang-action (VLA) policy and a practical human-data capture setup for universal dexterous hand control。まず、ロボット中心のデータセットであるUniDex-Datasetを構築する。人体データをロボット操作可能な軌跡に変換するために,手指の指先の位置を調整し,手指との接触を安定的に保ちながら,人体をロボット操作可能な軌跡へと変換し,手指を狭い運動と視覚の隙間に隠蔽した明示的な3Dポイントクラウドで操作する。第2に、機能的類似のアクチュエータを共有座標にマッピングし、クロスハンド転送を可能にする統合アクション空間であるFunction-Actuator-Aligned Space (FAAS)を導入する。 FAASをアクションパラメータ化として活用することで、UniDex-VLA、UniDex-Datasetで事前訓練された3D VLAポリシーを訓練し、タスクのデモンストレーションで微調整する。さらに、我々は、同期したRGB-Dストリームと人間の手ポーズを記録し、それらをロボット実行可能なトラジェクトリに変換して、コストのかかるロボットのデモンストレーションへの依存を減らす、シンプルなポータブルキャプチャーセットであるUniDex-Capを構築した。 2つの異なる手にわたる挑戦的なツール使用タスクにおいて、UniDex-VLAは平均タスク進捗率81%を達成し、VLA前のベースラインを大きなマージンで上回りながら、強い空間、オブジェクト、ゼロショットのクロスハンド一般化を示す。 UniDex-Dataset、UniDex-VLA、UniDex-Capは、ユニバーサルデクスタラス操作のためのスケーラブルな基盤スイートを提供する。

論文の概要: UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

関連論文リスト