Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements
- URL: http://arxiv.org/abs/2111.00763v1
- Date: Mon, 1 Nov 2021 08:24:10 GMT
- Title: Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements
- Authors: Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy
- Abstract summary: We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
- Score: 96.40125818594952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D interacting hand reconstruction is essential to facilitate human-machine
interaction and human behaviors understanding. Previous works in this field
either rely on auxiliary inputs such as depth images or they can only handle a
single hand if monocular single RGB images are used. Single-hand methods tend
to generate collided hand meshes, when applied to closely interacting hands,
since they cannot model the interactions between two hands explicitly. In this
paper, we make the first attempt to reconstruct 3D interacting hands from
monocular single RGB images. Our method can generate 3D hand meshes with both
precise 3D poses and minimal collisions. This is made possible via a two-stage
framework. Specifically, the first stage adopts a convolutional neural network
to generate coarse predictions that tolerate collisions but encourage
pose-accurate hand meshes. The second stage progressively ameliorates the
collisions through a series of factorized refinements while retaining the
preciseness of 3D poses. We carefully investigate potential implementations for
the factorized refinement, considering the trade-off between efficiency and
accuracy. Extensive quantitative and qualitative results on large-scale
datasets such as InterHand2.6M demonstrate the effectiveness of the proposed
approach.
Related papers
- 3D Pose Estimation of Two Interacting Hands from a Monocular Event
Camera [59.846927201816776]
This paper introduces the first framework for 3D tracking of two fast-moving and interacting hands from a single monocular event camera.
Our approach tackles the left-right hand ambiguity with a novel semi-supervised feature-wise attention mechanism and integrates an intersection loss to fix hand collisions.
arXiv Detail & Related papers (2023-12-21T18:59:57Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation
ofHands and Object in Interaction [33.661745138578596]
We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image.
Our method starts by extracting a set of potential 2D locations for the joints of both hands as extrema of a heatmap.
We use appearance and spatial encodings of these locations as input to a transformer, and leverage the attention mechanisms to sort out the correct configuration of the joints.
arXiv Detail & Related papers (2021-04-29T20:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.