Fugu-MT 論文翻訳(概要): Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

論文の概要: Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

arxiv url: http://arxiv.org/abs/2511.12878v2
Date: Tue, 18 Nov 2025 05:00:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 13:59:16.794443
Title: Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views
Title（参考訳）: Uni-Hand:エゴセントリックな視点でのユニバーサルハンドモーション予測
Authors: Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Yu Zheng, Erhang Zhang, Xieyuanli Chen, Hesheng Wang,
Abstract要約: マルチモーダル入力,多次元およびマルチターゲット予測パターン,マルチタスクの可利用性を考慮したユニバーサルハンドモーション予測フレームワークを提案する。ヒトの頭と手の動きを同時に予測し、自我中心視における動きのシナジーを捉えるために、新しい二重枝拡散法が提案されている。文献に下流タスク評価を取り入れた最初の試みとして,手の動き予測アルゴリズムの現実的適用性を評価するための新しいベンチマークを構築した。
参考スコア（独自算出の注目度）: 40.35520614736267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Forecasting how human hands move in egocentric views is critical for applications like augmented reality and human-robot policy transfer. Recently, several hand trajectory prediction (HTP) methods have been developed to generate future possible hand waypoints, which still suffer from insufficient prediction targets, inherent modality gaps, entangled hand-head motion, and limited validation in downstream tasks. To address these limitations, we present a universal hand motion forecasting framework considering multi-modal input, multi-dimensional and multi-target prediction patterns, and multi-task affordances for downstream applications. We harmonize multiple modalities by vision-language fusion, global context incorporation, and task-aware text embedding injection, to forecast hand waypoints in both 2D and 3D spaces. A novel dual-branch diffusion is proposed to concurrently predict human head and hand movements, capturing their motion synergy in egocentric vision. By introducing target indicators, the prediction model can forecast the specific joint waypoints of the wrist or the fingers, besides the widely studied hand center points. In addition, we enable Uni-Hand to additionally predict hand-object interaction states (contact/separation) to facilitate downstream tasks better. As the first work to incorporate downstream task evaluation in the literature, we build novel benchmarks to assess the real-world applicability of hand motion forecasting algorithms. The experimental results on multiple publicly available datasets and our newly proposed benchmarks demonstrate that Uni-Hand achieves the state-of-the-art performance in multi-dimensional and multi-target hand motion forecasting. Extensive validation in multiple downstream tasks also presents its impressive human-robot policy transfer to enable robotic manipulation, and effective feature enhancement for action anticipation/recognition.
Abstract（参考訳）: 拡張現実や人間ロボットのポリシー伝達といった応用には、人間の手が自我中心の視点でどのように動くかを予測することが不可欠だ。近年, 予測目標の不足, 固有モダリティギャップ, 絡み合った手頭動作, 下流タスクにおける限定的な検証に苦しむ, 将来可能な手道点を生成するために, ハンドトラジェクトリ予測法が開発されている。これらの制約に対処するために,マルチモーダル入力,多次元およびマルチターゲット予測パターン,下流アプリケーションにおけるマルチタスクの可利用性を考慮したユニバーサルハンドモーション予測フレームワークを提案する。我々は2次元空間と3次元空間の両方のハンドウェイポイントを予測するために、視覚言語融合、グローバルコンテキスト取り込み、タスク認識テキスト埋め込み注入により複数のモーダルを調和させる。ヒトの頭と手の動きを同時に予測し、自我中心視における動きのシナジーを捉えるために、新しい二重枝拡散法が提案されている。対象の指標を導入することで、手首や指の特定の関節の進路を予測できる。さらに,UniHandが手動動作状態(接触・分離)を予測し,下流タスクをより容易に行えるようにする。文献に下流タスク評価を取り入れた最初の試みとして,手の動き予測アルゴリズムの現実的適用性を評価するための新しいベンチマークを構築した。複数の公開データセットの実験結果と新たに提案したベンチマークにより,Uni-Handは多次元・多目的手の動き予測において最先端の性能を発揮することが示された。複数の下流タスクにおける広範囲な検証はまた、ロボット操作を可能にするための印象的なヒューマンロボットポリシーの転送と、アクション予測/認識のための効果的な機能強化を提示する。

論文の概要: Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

関連論文リスト