HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation
ofHands and Object in Interaction
- URL: http://arxiv.org/abs/2104.14639v1
- Date: Thu, 29 Apr 2021 20:19:20 GMT
- Title: HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation
ofHands and Object in Interaction
- Authors: Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit
- Abstract summary: We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image.
Our method starts by extracting a set of potential 2D locations for the joints of both hands as extrema of a heatmap.
We use appearance and spatial encodings of these locations as input to a transformer, and leverage the attention mechanisms to sort out the correct configuration of the joints.
- Score: 33.661745138578596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a robust and accurate method for estimating the 3D poses of two
hands in close interaction from a single color image. This is a very
challenging problem, as large occlusions and many confusions between the joints
may happen. Our method starts by extracting a set of potential 2D locations for
the joints of both hands as extrema of a heatmap. We do not require that all
locations correctly correspond to a joint, not that all the joints are
detected. We use appearance and spatial encodings of these locations as input
to a transformer, and leverage the attention mechanisms to sort out the correct
configuration of the joints and output the 3D poses of both hands. Our approach
thus allies the recognition power of a Transformer to the accuracy of
heatmap-based methods. We also show it can be extended to estimate the 3D pose
of an object manipulated by one or two hands. We evaluate our approach on the
recent and challenging InterHand2.6M and HO-3D datasets. We obtain 17%
improvement over the baseline. Moreover, we introduce the first dataset made of
action sequences of two hands manipulating an object fully annotated in 3D and
will make it publicly available.
Related papers
- A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting
Hand Pose Estimation from a Single RGB Image [46.5947382684857]
We propose to extend A2J-the state-of-the-art depth-based 3D single hand pose estimation method-to RGB domain under interacting hand condition.
A2J is evolved under Transformer's non-local encoding-decoding framework to build A2J-Transformer.
Experiments on challenging InterHand 2.6M demonstrate that, A2J-Transformer can achieve state-of-the-art model-free performance.
arXiv Detail & Related papers (2023-04-07T13:30:36Z) - Ego2HandsPose: A Dataset for Egocentric Two-hand 3D Global Pose
Estimation [0.0]
Ego2HandsPose is the first dataset that enables color-based two-hand 3D tracking in unseen domains.
We develop a set of parametric fitting algorithms to enable 1) 3D hand pose annotation using a single image, 2) automatic conversion from 2D to 3D hand poses and 3) accurate two-hand tracking with temporal consistency.
arXiv Detail & Related papers (2022-06-10T07:50:45Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - Reconstructing Hand-Object Interactions in the Wild [71.16013096764046]
We propose an optimization-based procedure which does not require direct 3D supervision.
We exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction.
Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets.
arXiv Detail & Related papers (2020-12-17T18:59:58Z) - Two-hand Global 3D Pose Estimation Using Monocular RGB [0.0]
We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images.
We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands.
We present the first work that achieves accurate global 3D hand tracking on both hands using RGB-only inputs.
arXiv Detail & Related papers (2020-06-01T23:53:52Z) - HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation [7.559220068352681]
We propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time.
Our network uses a cascade of two adaptive graph convolutional neural networks, one to estimate 2D coordinates of the hand joints and object corners, followed by another to convert 2D coordinates to 3D.
arXiv Detail & Related papers (2020-03-31T19:01:42Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.