MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis
- URL: http://arxiv.org/abs/2010.01158v1
- Date: Fri, 2 Oct 2020 18:27:34 GMT
- Title: MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis
- Authors: Zhenyu Wu, Duc Hoang, Shih-Yao Lin, Yusheng Xie, Liangjian Chen,
Yen-Yu Lin, Zhangyang Wang, Wei Fan
- Abstract summary: Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
- Score: 81.40640219844197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the 3D hand pose from a monocular RGB image is important but
challenging. A solution is training on large-scale RGB hand images with
accurate 3D hand keypoint annotations. However, it is too expensive in
practice. Instead, we have developed a learning-based approach to synthesize
realistic, diverse, and 3D pose-preserving hand images under the guidance of 3D
pose information. We propose a 3D-aware multi-modal guided hand generative
network (MM-Hand), together with a novel geometry-based curriculum learning
strategy. Our extensive experimental results demonstrate that the 3D-annotated
images generated by MM-Hand qualitatively and quantitatively outperform
existing options. Moreover, the augmented data can consistently improve the
quantitative performance of the state-of-the-art 3D hand pose estimators on two
benchmark datasets. The code will be available at
https://github.com/ScottHoang/mm-hand.
Related papers
- Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - Consistent 3D Hand Reconstruction in Video via self-supervised Learning [67.55449194046996]
We present a method for reconstructing accurate and consistent 3D hands from a monocular video.
detected 2D hand keypoints and the image texture provide important cues about the geometry and texture of the 3D hand.
We propose $rm S2HAND$, a self-supervised 3D hand reconstruction model.
arXiv Detail & Related papers (2022-01-24T09:44:11Z) - 3D Hand Pose and Shape Estimation from RGB Images for Improved
Keypoint-Based Hand-Gesture Recognition [25.379923604213626]
This paper presents a keypoint-based end-to-end framework for the 3D hand and pose estimation.
It is successfully applied to the hand-gesture recognition task as a study case.
arXiv Detail & Related papers (2021-09-28T17:07:43Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z) - Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data [77.34069717612493]
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps.
This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data.
It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
arXiv Detail & Related papers (2020-03-21T03:51:54Z) - Silhouette-Net: 3D Hand Pose Estimation from Silhouettes [16.266199156878056]
Existing approaches mainly consider different input modalities and settings, such as monocular RGB, multi-view RGB, depth, or point cloud.
We present a new architecture that automatically learns a guidance from implicit depth perception and solves the ambiguity of hand pose through end-to-end training.
arXiv Detail & Related papers (2019-12-28T10:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.