Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction
- URL: http://arxiv.org/abs/2003.13764v2
- Date: Thu, 10 Sep 2020 15:35:17 GMT
- Title: Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction
- Authors: Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas
Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang,
Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang,
Haifeng Sun, Marek Hr\'uz, Jakub Kanis, Zden\v{e}k Kr\v{n}oul, Qingfu Wan,
Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou, Sijia Mei,
Yunhui Liu, Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Philippe Weinzaepfel,
Romain Br\'egier, Gr\'egory Rogez, Vincent Lepetit, Tae-Kyun Kim
- Abstract summary: HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
- Score: 137.28465645405655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how well different types of approaches generalise in the task of 3D
hand pose estimation under single hand scenarios and hand-object interaction.
We show that the accuracy of state-of-the-art methods can drop, and that they
fail mostly on poses absent from the training set. Unfortunately, since the
space of hand poses is highly dimensional, it is inherently not feasible to
cover the whole space densely, despite recent efforts in collecting large-scale
training datasets. This sampling problem is even more severe when hands are
interacting with objects and/or inputs are RGB rather than depth images, as RGB
images also vary with lighting conditions and colors. To address these issues,
we designed a public challenge (HANDS'19) to evaluate the abilities of current
3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a
training set. More exactly, HANDS'19 is designed (a) to evaluate the influence
of both depth and color modalities on 3D hand pose estimation, under the
presence or absence of objects; (b) to assess the generalisation abilities
w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to
explore the use of a synthetic hand model to fill the gaps of current datasets.
Through the challenge, the overall accuracy has dramatically improved over the
baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint
error. Our analyses highlight the impacts of: Data pre-processing, ensemble
approaches, the use of a parametric 3D hand model (MANO), and different HPE
methods/backbones.
Related papers
- SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition [5.359837526794863]
Hand pose represents key information for action recognition in the egocentric perspective.
We propose to improve egocentric 3D hand pose estimation based on RGB frames only by using pseudo-depth images.
arXiv Detail & Related papers (2024-08-19T14:30:29Z) - Denoising Diffusion for 3D Hand Pose Estimation from Images [38.20064386142944]
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences.
We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes.
The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
arXiv Detail & Related papers (2023-08-18T12:57:22Z) - Transformer-based Global 3D Hand Pose Estimation in Two Hands
Manipulating Objects Scenarios [13.59950629234404]
This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation)
In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint.
Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture.
arXiv Detail & Related papers (2022-10-20T16:24:47Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - What's in your hands? 3D Reconstruction of Generic Objects in Hands [49.12461675219253]
Our work aims to reconstruct hand-held objects given a single RGB image.
In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates.
arXiv Detail & Related papers (2022-04-14T17:59:02Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.