The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation
- URL: http://arxiv.org/abs/2504.08654v1
- Date: Fri, 11 Apr 2025 15:58:31 GMT
- Title: The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation
- Authors: Masashi Hatano, Zhifan Zhu, Hideo Saito, Dima Damen,
- Abstract summary: We propose a method to forecast the 3D trajectories and poses of both hands from an egocentric video.<n>We leverage full-body pose information, allowing other joints to provide constraints on hand motion.<n>We evaluate EgoH4 on the Ego-Exo4D dataset, combining subsets with body and hand annotations.
- Score: 25.320774988055167
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Forecasting hand motion and pose from an egocentric perspective is essential for understanding human intention. However, existing methods focus solely on predicting positions without considering articulation, and only when the hands are visible in the field of view. This limitation overlooks the fact that approximate hand positions can still be inferred even when they are outside the camera's view. In this paper, we propose a method to forecast the 3D trajectories and poses of both hands from an egocentric video, both in and out of the field of view. We propose a diffusion-based transformer architecture for Egocentric Hand Forecasting, EgoH4, which takes as input the observation sequence and camera poses, then predicts future 3D motion and poses for both hands of the camera wearer. We leverage full-body pose information, allowing other joints to provide constraints on hand motion. We denoise the hand and body joints along with a visibility predictor for hand joints and a 3D-to-2D reprojection loss that minimizes the error when hands are in-view. We evaluate EgoH4 on the Ego-Exo4D dataset, combining subsets with body and hand annotations. We train on 156K sequences and evaluate on 34K sequences, respectively. EgoH4 improves the performance by 3.4cm and 5.1cm over the baseline in terms of ADE for hand trajectory forecasting and MPJPE for hand pose forecasting. Project page: https://masashi-hatano.github.io/EgoH4/
Related papers
- Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation [67.9563319914377]
This paper investigates the usefulness of rear cameras in the head-mounted device (HMD) design for full-body tracking.<n>We propose a new transformer-based method that refines 2D joint heatmap estimation with multi-view information and heatmap uncertainty.<n>Our experiments show that the new camera configurations with back views provide superior support for 3D pose tracking.
arXiv Detail & Related papers (2025-03-14T17:59:54Z) - Estimating Body and Hand Motion in an Ego-sensed World [62.61989004520802]
We present EgoAllo, a system for human motion estimation from a head-mounted device.<n>Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
arXiv Detail & Related papers (2024-10-04T17:59:57Z) - EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos [9.340890244344497]
Existing methods for forecasting 2D hand positions rely on visual representations and mainly focus on hand-object interactions.
We propose EMAG, an ego-motion-aware and generalizable 2D hand forecasting method.
Our model outperforms prior methods by 1.7% and 7.0% on intra and cross-dataset evaluations.
arXiv Detail & Related papers (2024-05-30T13:15:18Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - 3D Hand Pose Estimation in Everyday Egocentric Images [12.964086079352262]
We focus on challenges arising from perspective distortion and lack of 3D annotations in the wild.
We present WildHands, a system for 3D hand pose estimation in everyday egocentric images.
arXiv Detail & Related papers (2023-12-11T18:15:47Z) - AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand
Pose Estimation [26.261767086366866]
We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations.
AssemblyHands provides 3.0M annotated images, including 490K egocentric images.
Our study shows that having higher-quality hand poses directly improves the ability to recognize actions.
arXiv Detail & Related papers (2023-04-24T17:52:57Z) - Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR.
We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation.
This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z) - Transformer-based Global 3D Hand Pose Estimation in Two Hands
Manipulating Objects Scenarios [13.59950629234404]
This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation)
In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint.
Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture.
arXiv Detail & Related papers (2022-10-20T16:24:47Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.