Ego2HandsPose: A Dataset for Egocentric Two-hand 3D Global Pose
Estimation
- URL: http://arxiv.org/abs/2206.04927v1
- Date: Fri, 10 Jun 2022 07:50:45 GMT
- Title: Ego2HandsPose: A Dataset for Egocentric Two-hand 3D Global Pose
Estimation
- Authors: Fanqing Lin, Tony Martinez
- Abstract summary: Ego2HandsPose is the first dataset that enables color-based two-hand 3D tracking in unseen domains.
We develop a set of parametric fitting algorithms to enable 1) 3D hand pose annotation using a single image, 2) automatic conversion from 2D to 3D hand poses and 3) accurate two-hand tracking with temporal consistency.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Color-based two-hand 3D pose estimation in the global coordinate system is
essential in many applications. However, there are very few datasets dedicated
to this task and no existing dataset supports estimation in a non-laboratory
environment. This is largely attributed to the sophisticated data collection
process required for 3D hand pose annotations, which also leads to difficulty
in obtaining instances with the level of visual diversity needed for estimation
in the wild. Progressing towards this goal, a large-scale dataset Ego2Hands was
recently proposed to address the task of two-hand segmentation and detection in
the wild. The proposed composition-based data generation technique can create
two-hand instances with quality, quantity and diversity that generalize well to
unseen domains. In this work, we present Ego2HandsPose, an extension of
Ego2Hands that contains 3D hand pose annotation and is the first dataset that
enables color-based two-hand 3D tracking in unseen domains. To this end, we
develop a set of parametric fitting algorithms to enable 1) 3D hand pose
annotation using a single image, 2) automatic conversion from 2D to 3D hand
poses and 3) accurate two-hand tracking with temporal consistency. We provide
incremental quantitative analysis on the multi-stage pipeline and show that
training on our dataset achieves state-of-the-art results that significantly
outperforms other datasets for the task of egocentric two-hand global 3D pose
estimation.
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in
Large-scale Natural Environments [34.24004079703609]
We introduce WildScenes, a bi-modal benchmark dataset consisting of multiple large-scales in natural environments.
The data is trajectory-centric with accurate localization and globally aligned point clouds.
We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques.
arXiv Detail & Related papers (2023-12-23T22:27:40Z) - Decanus to Legatus: Synthetic training for 2D-3D human pose lifting [26.108023246654646]
We propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus)
Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the potential of our framework.
arXiv Detail & Related papers (2022-10-05T13:10:19Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation
ofHands and Object in Interaction [33.661745138578596]
We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image.
Our method starts by extracting a set of potential 2D locations for the joints of both hands as extrema of a heatmap.
We use appearance and spatial encodings of these locations as input to a transformer, and leverage the attention mechanisms to sort out the correct configuration of the joints.
arXiv Detail & Related papers (2021-04-29T20:19:20Z) - Two-hand Global 3D Pose Estimation Using Monocular RGB [0.0]
We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images.
We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands.
We present the first work that achieves accurate global 3D hand tracking on both hands using RGB-only inputs.
arXiv Detail & Related papers (2020-06-01T23:53:52Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.