AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand
Pose Estimation
- URL: http://arxiv.org/abs/2304.12301v1
- Date: Mon, 24 Apr 2023 17:52:57 GMT
- Title: AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand
Pose Estimation
- Authors: Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, Luan Tran, Cem
Keskin
- Abstract summary: We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations.
AssemblyHands provides 3.0M annotated images, including 490K egocentric images.
Our study shows that having higher-quality hand poses directly improves the ability to recognize actions.
- Score: 26.261767086366866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present AssemblyHands, a large-scale benchmark dataset with accurate 3D
hand pose annotations, to facilitate the study of egocentric activities with
challenging hand-object interactions. The dataset includes synchronized
egocentric and exocentric images sampled from the recent Assembly101 dataset,
in which participants assemble and disassemble take-apart toys. To obtain
high-quality 3D hand pose annotations for the egocentric images, we develop an
efficient pipeline, where we use an initial set of manual annotations to train
a model to automatically annotate a much larger dataset. Our annotation model
uses multi-view feature fusion and an iterative refinement scheme, and achieves
an average keypoint error of 4.20 mm, which is 85% lower than the error of the
original annotations in Assembly101. AssemblyHands provides 3.0M annotated
images, including 490K egocentric images, making it the largest existing
benchmark dataset for egocentric 3D hand pose estimation. Using this data, we
develop a strong single-view baseline of 3D hand pose estimation from
egocentric images. Furthermore, we design a novel action classification task to
evaluate predicted 3D hand poses. Our study shows that having higher-quality
hand poses directly improves the ability to recognize actions.
Related papers
- In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023
Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction [11.551318550321938]
Using AssemblyHands, this challenge focuses on egocentric 3D hand pose estimation from a single-view image.
We adopt ViT based backbones and a simple regressor for 3D keypoints prediction, which provides strong model baselines.
Our method achieved 12.21mm MPJPE on test dataset, achieve the first place in Egocentric 3D Hand Pose Estimation challenge.
arXiv Detail & Related papers (2023-10-07T10:25:50Z) - Ego2HandsPose: A Dataset for Egocentric Two-hand 3D Global Pose
Estimation [0.0]
Ego2HandsPose is the first dataset that enables color-based two-hand 3D tracking in unseen domains.
We develop a set of parametric fitting algorithms to enable 1) 3D hand pose annotation using a single image, 2) automatic conversion from 2D to 3D hand poses and 3) accurate two-hand tracking with temporal consistency.
arXiv Detail & Related papers (2022-06-10T07:50:45Z) - Understanding Egocentric Hand-Object Interactions from Hand Pose
Estimation [24.68535915849555]
We propose a method to label a dataset which contains the egocentric images pair-wisely.
We also use the collected pairwise data to train our encoder-decoder style network which has been proven efficient in.
arXiv Detail & Related papers (2021-09-29T18:34:06Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.