RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video
- URL: http://arxiv.org/abs/2106.11725v1
- Date: Tue, 22 Jun 2021 12:53:56 GMT
- Title: RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video
- Authors: Jiayi Wang, Franziska Mueller, Florian Bernard, Suzanne Sorli,
Oleksandr Sotnychenko, Neng Qian, Miguel A. Otaduy, Dan Casas and Christian
Theobalt
- Abstract summary: We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
- Score: 76.86512780916827
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Tracking and reconstructing the 3D pose and geometry of two hands in
interaction is a challenging problem that has a high relevance for several
human-computer interaction applications, including AR/VR, robotics, or sign
language recognition. Existing works are either limited to simpler tracking
settings (e.g., considering only a single hand or two spatially separated
hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast,
in this work we present the first real-time method for motion capture of
skeletal pose and 3D surface geometry of hands from a single RGB camera that
explicitly considers close interactions. In order to address the inherent depth
ambiguities in RGB data, we propose a novel multi-task CNN that regresses
multiple complementary pieces of information, including segmentation, dense
matchings to a 3D hand model, and 2D keypoint positions, together with newly
proposed intra-hand relative depth and inter-hand distance maps. These
predictions are subsequently used in a generative model fitting framework in
order to estimate pose and shape parameters of a 3D hand model for both hands.
We experimentally verify the individual components of our RGB two-hand tracking
and 3D reconstruction pipeline through an extensive ablation study. Moreover,
we demonstrate that our approach offers previously unseen two-hand tracking
performance from RGB, and quantitatively and qualitatively outperforms existing
RGB-based methods that were not explicitly designed for two-hand interactions.
Moreover, our method even performs on-par with depth-based real-time methods.
Related papers
- SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition [5.359837526794863]
Hand pose represents key information for action recognition in the egocentric perspective.
We propose to improve egocentric 3D hand pose estimation based on RGB frames only by using pseudo-depth images.
arXiv Detail & Related papers (2024-08-19T14:30:29Z) - Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - 3D Pose Estimation of Two Interacting Hands from a Monocular Event
Camera [59.846927201816776]
This paper introduces the first framework for 3D tracking of two fast-moving and interacting hands from a single monocular event camera.
Our approach tackles the left-right hand ambiguity with a novel semi-supervised feature-wise attention mechanism and integrates an intersection loss to fix hand collisions.
arXiv Detail & Related papers (2023-12-21T18:59:57Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - Decoupled Iterative Refinement Framework for Interacting Hands
Reconstruction from a Single RGB Image [30.24438569170251]
We propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction.
Our method outperforms all existing two-hand reconstruction methods by a large margin on the InterHand2.6M dataset.
arXiv Detail & Related papers (2023-02-05T15:46:57Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - Two-hand Global 3D Pose Estimation Using Monocular RGB [0.0]
We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images.
We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands.
We present the first work that achieves accurate global 3D hand tracking on both hands using RGB-only inputs.
arXiv Detail & Related papers (2020-06-01T23:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.