Fugu-MT 論文翻訳(概要): EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

論文の概要: EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

arxiv url: http://arxiv.org/abs/2601.01050v1
Date: Sat, 03 Jan 2026 03:08:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-06 16:25:21.971127
Title: EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos
Title（参考訳）: EgoGrasp:エゴセントリックビデオから世界空間のハンドオブジェクトインタラクション推定
Authors: Hongming Fu, Wenjia Wang, Xiaozhen Qiao, Shuo Yang, Zheng Liu, Bo Zhao,
Abstract要約: EgoGraspは,世界空間のハンドオブジェクトインタラクション(W-HOI)を,野生のダイナミックカメラを用いて,エゴセントリックなモノクロビデオから再構築する最初の方法である。実験では,W-HOI再建における最先端性能を実現する手法を実証した。
参考スコア（独自算出の注目度）: 25.047225764745978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose EgoGrasp, the first method to reconstruct world-space hand-object interactions (W-HOI) from egocentric monocular videos with dynamic cameras in the wild. Accurate W-HOI reconstruction is critical for understanding human behavior and enabling applications in embodied intelligence and virtual reality. However, existing hand-object interactions (HOI) methods are limited to single images or camera coordinates, failing to model temporal dynamics or consistent global trajectories. Some recent approaches attempt world-space hand estimation but overlook object poses and HOI constraints. Their performance also suffers under severe camera motion and frequent occlusions common in egocentric in-the-wild videos. To address these challenges, we introduce a multi-stage framework with a robust pre-process pipeline built on newly developed spatial intelligence models, a whole-body HOI prior model based on decoupled diffusion models, and a multi-objective test-time optimization paradigm. Our HOI prior model is template-free and scalable to multiple objects. In experiments, we prove our method achieving state-of-the-art performance in W-HOI reconstruction.
Abstract（参考訳）: EgoGraspは,世界空間のハンドオブジェクトインタラクション(W-HOI)を,野生のダイナミックカメラを用いて,エゴセントリックなモノクロビデオから再構築する最初の方法である。正確なW-HOI再構成は、人間の行動を理解し、具体的知性とバーチャルリアリティーの応用を可能にするために重要である。しかし、既存のハンドオブジェクト相互作用(HOI)法は、時間的ダイナミクスや一貫したグローバルな軌道のモデル化に失敗し、単一の画像やカメラ座標に限られている。最近のいくつかのアプローチでは、世界空間のハンド推定を試みるが、見落としオブジェクトのポーズとHOI制約がある。彼らのパフォーマンスは、厳格なカメラの動きと、エゴセントリックなインザワイルドビデオでよく見られる閉塞性にも悩まされる。これらの課題に対処するために、新たに開発された空間知能モデルに基づく堅牢なプロセス前処理パイプライン、疎結合拡散モデルに基づく全身HOI事前モデル、多目的テスト時間最適化のパラダイムを備えた多段階フレームワークを導入する。私たちのHOI以前のモデルはテンプレートフリーで、複数のオブジェクトに対してスケーラブルです。実験では,W-HOI再建における最先端性能を実現する手法を実証した。

論文の概要: EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

関連論文リスト