MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand
Pose Estimation
- URL: http://arxiv.org/abs/2012.03206v1
- Date: Sun, 6 Dec 2020 07:55:08 GMT
- Title: MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand
Pose Estimation
- Authors: Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, and Xiaohui Xie
- Abstract summary: Estimating 3D hand poses from a single RGB image is challenging because depth ambiguity leads the problem ill-posed.
We design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth.
We present a multi-view hand pose estimation approach to verify that training a hand pose estimator with our generated dataset greatly enhances the performance.
- Score: 32.12879364117658
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Estimating 3D hand poses from a single RGB image is challenging because depth
ambiguity leads the problem ill-posed. Training hand pose estimators with 3D
hand mesh annotations and multi-view images often results in significant
performance gains. However, existing multi-view datasets are relatively small
with hand joints annotated by off-the-shelf trackers or automated through model
predictions, both of which may be inaccurate and can introduce biases.
Collecting a large-scale multi-view 3D hand pose images with accurate mesh and
joint annotations is valuable but strenuous. In this paper, we design a spin
match algorithm that enables a rigid mesh model matching with any target mesh
ground truth. Based on the match algorithm, we propose an efficient pipeline to
generate a large-scale multi-view hand mesh (MVHM) dataset with accurate 3D
hand mesh and joint labels. We further present a multi-view hand pose
estimation approach to verify that training a hand pose estimator with our
generated dataset greatly enhances the performance. Experimental results show
that our approach achieves the performance of 0.990 in
$\text{AUC}_{\text{20-50}}$ on the MHP dataset compared to the previous
state-of-the-art of 0.939 on this dataset. Our datasset is public available.
\footnote{\url{https://github.com/Kuzphi/MVHM}} Our datasset is available
at~\href{https://github.com/Kuzphi/MVHM}{\color{blue}{https://github.com/Kuzphi/MVHM}}.
Related papers
- Monocular 3D Hand Mesh Recovery via Dual Noise Estimation [47.82179706128616]
We introduce a dual noise estimation method to generate meshes that are aligned with the image well.
Our method achieves state-of-the-art performance on a large-scale Interhand2.6M dataset.
arXiv Detail & Related papers (2023-12-26T07:21:01Z) - Reconstructing Hands in 3D with Transformers [64.15390309553892]
We present an approach that can reconstruct hands in 3D from monocular input.
Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work.
arXiv Detail & Related papers (2023-12-08T18:59:07Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh
Reconstruction from a Single RGB Image [9.238322841389994]
We propose a single-stage pipeline for multi-hand reconstruction.
Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture.
Our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
arXiv Detail & Related papers (2022-04-18T03:57:14Z) - Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction [57.3636347704271]
3D hand-mesh reconstruction from RGB images facilitates many applications, including augmented reality (AR)
This paper presents a novel pipeline by decoupling the hand-mesh reconstruction task into three stages.
We can promote high-quality finger-level mesh-image alignment and drive the models together to deliver real-time predictions.
arXiv Detail & Related papers (2021-09-03T20:42:01Z) - HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology.
We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z) - Im2Mesh GAN: Accurate 3D Hand Mesh Recovery from a Single RGB Image [31.371190180801452]
We show that the hand mesh can be learned directly from the input image.
We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training.
arXiv Detail & Related papers (2021-01-27T07:38:01Z) - MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z) - BiHand: Recovering Hand Mesh with Multi-stage Bisected Hourglass
Networks [37.65510556305611]
We introduce an end-to-end learnable model, BiHand, which consists of three cascaded stages, namely 2D seeding stage, 3D lifting stage, and mesh generation stage.
At the output of BiHand, the full hand mesh will be recovered using the joint rotations and shape parameters predicted from the network.
Our model can achieve superior accuracy in comparison with state-of-the-art methods, and can produce appealing 3D hand meshes in several severe conditions.
arXiv Detail & Related papers (2020-08-12T03:13:17Z) - Multi-Person Absolute 3D Human Pose Estimation with Weak Depth
Supervision [0.0]
We introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion.
Our algorithm is a monocular, multi-person, absolute pose estimator.
We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates.
arXiv Detail & Related papers (2020-04-08T13:29:22Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.