HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks
- URL: http://arxiv.org/abs/2107.01205v1
- Date: Fri, 2 Jul 2021 17:59:54 GMT
- Title: HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural
Networks
- Authors: Jameel Malik and Soshi Shimada and Ahmed Elhayek and Sk Aziz Ali and
Christian Theobalt and Vladislav Golyanik and Didier Stricker
- Abstract summary: HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology.
We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
- Score: 71.09275975580009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D hand shape and pose estimation from a single depth map is a new and
challenging computer vision problem with many applications. Existing methods
addressing it directly regress hand meshes via 2D convolutional neural
networks, which leads to artifacts due to perspective distortions in the
images. To address the limitations of the existing methods, we develop
HandVoxNet++, i.e., a voxel-based deep network with 3D and graph convolutions
trained in a fully supervised manner. The input to our network is a 3D
voxelized-depth-map-based on the truncated signed distance function (TSDF).
HandVoxNet++ relies on two hand shape representations. The first one is the 3D
voxelized grid of hand shape, which does not preserve the mesh topology and
which is the most accurate representation. The second representation is the
hand surface that preserves the mesh topology. We combine the advantages of
both representations by aligning the hand surface to the voxelized hand shape
either with a new neural Graph-Convolutions-based Mesh Registration
(GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach
(NRGA++) which does not rely on training data. In extensive evaluations on
three public benchmarks, i.e., SynHand5M, depth-based HANDS19 challenge and
HO-3D, the proposed HandVoxNet++ achieves the state-of-the-art performance. In
this journal extension of our previous approach presented at CVPR 2020, we gain
41.09% and 13.7% higher shape alignment accuracy on SynHand5M and HANDS19
datasets, respectively. Our method is ranked first on the HANDS19 challenge
dataset (Task 1: Depth-Based 3D Hand Pose Estimation) at the moment of the
submission of our results to the portal in August 2020.
Related papers
- WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild.
The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model.
Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z) - Reconstructing Hands in 3D with Transformers [64.15390309553892]
We present an approach that can reconstruct hands in 3D from monocular input.
Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work.
arXiv Detail & Related papers (2023-12-08T18:59:07Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh
Reconstruction from a Single RGB Image [9.238322841389994]
We propose a single-stage pipeline for multi-hand reconstruction.
Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture.
Our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
arXiv Detail & Related papers (2022-04-18T03:57:14Z) - HandFoldingNet: A 3D Hand Pose Estimation Network Using
Multiscale-Feature Guided Folding of a 2D Hand Skeleton [4.1954750695245835]
This paper proposes HandFoldingNet, an accurate and efficient hand pose estimator.
The proposed model utilizes a folding-based decoder that folds a given 2D hand skeleton into the corresponding joint coordinates.
Experimental results show that the proposed model outperforms the existing methods on three hand pose benchmark datasets.
arXiv Detail & Related papers (2021-08-12T05:52:44Z) - MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand
Pose Estimation [32.12879364117658]
Estimating 3D hand poses from a single RGB image is challenging because depth ambiguity leads the problem ill-posed.
We design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth.
We present a multi-view hand pose estimation approach to verify that training a hand pose estimator with our generated dataset greatly enhances the performance.
arXiv Detail & Related papers (2020-12-06T07:55:08Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z) - HandAugment: A Simple Data Augmentation Method for Depth-Based 3D Hand
Pose Estimation [3.0969191504482247]
We propose HandAugment, a method to synthesize image data to augment the training process of the neural networks.
We show that our method achieves the first place in the task of depth-based 3D hand pose estimation in HANDS 2019 challenge.
arXiv Detail & Related papers (2020-01-03T03:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.