HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
- URL: http://arxiv.org/abs/2501.02973v1
- Date: Mon, 06 Jan 2025 12:29:33 GMT
- Title: HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
- Authors: Jinglei Zhang, Jiankang Deng, Chao Ma, Rolandos Alexandros Potamias,
- Abstract summary: HaWoR is a high-fidelity method for hand motion reconstruction in world coordinates from egocentric videos.
To achieve precise camera trajectory estimation, we propose an adaptive egocentric SLAM framework.
We demonstrate that HaWoR achieves state-of-the-art performance on both hand motion reconstruction and world-frame camera trajectory estimation.
- Score: 26.766489527823662
- License:
- Abstract: Despite the advent in 3D hand pose estimation, current methods predominantly focus on single-image 3D hand reconstruction in the camera frame, overlooking the world-space motion of the hands. Such limitation prohibits their direct use in egocentric video settings, where hands and camera are continuously in motion. In this work, we propose HaWoR, a high-fidelity method for hand motion reconstruction in world coordinates from egocentric videos. We propose to decouple the task by reconstructing the hand motion in the camera space and estimating the camera trajectory in the world coordinate system. To achieve precise camera trajectory estimation, we propose an adaptive egocentric SLAM framework that addresses the shortcomings of traditional SLAM methods, providing robust performance under challenging camera dynamics. To ensure robust hand motion trajectories, even when the hands move out of view frustum, we devise a novel motion infiller network that effectively completes the missing frames of the sequence. Through extensive quantitative and qualitative evaluations, we demonstrate that HaWoR achieves state-of-the-art performance on both hand motion reconstruction and world-frame camera trajectory estimation under different egocentric benchmark datasets. Code and models are available on https://hawor-project.github.io/ .
Related papers
- Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild.
We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery.
This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z) - UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos [25.41337525728398]
We introduce UniHOI, a model that unifies the estimation of all variables necessary for dense 4D reconstruction.
UniHOI is the first approach to offer fast, dense, and general monocular egocentric HOI scene reconstruction in the presence of motion.
arXiv Detail & Related papers (2024-11-14T02:57:11Z) - World-Grounded Human Motion Recovery via Gravity-View Coordinates [60.618543026949226]
We propose estimating human poses in a novel Gravity-View coordinate system.
The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame.
Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-09-10T17:25:47Z) - DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction [65.46359561104867]
We target the challenge of online 2D and 3D point tracking from unposed monocular camera input.
We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion.
We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
arXiv Detail & Related papers (2024-09-03T17:58:03Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z) - GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras.
We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions.
In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.