Mask2Hand: Learning to Predict the 3D Hand Pose and Shape from Shadow
- URL: http://arxiv.org/abs/2205.15553v1
- Date: Tue, 31 May 2022 06:04:27 GMT
- Title: Mask2Hand: Learning to Predict the 3D Hand Pose and Shape from Shadow
- Authors: Li-Jen Chang, Yu-Cheng Liao, Chia-Hui Lin, Hwann-Tzong Chen
- Abstract summary: Mask2Hand learns to solve the challenging task of predicting 3D hand pose and shape from a 2D binary mask of hand silhouette/shadow without additional manually-annotated data.
Experiments show that our method, which takes a single binary mask as the input, can achieve comparable prediction accuracy on both unaligned and aligned settings.
- Score: 13.9320397231143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a self-trainable method, Mask2Hand, which learns to solve the
challenging task of predicting 3D hand pose and shape from a 2D binary mask of
hand silhouette/shadow without additional manually-annotated data. Given the
intrinsic camera parameters and the parametric hand model in the camera space,
we adopt the differentiable rendering technique to project 3D estimations onto
the 2D binary silhouette space. By applying a tailored combination of losses
between the rendered silhouette and the input binary mask, we are able to
integrate the self-guidance mechanism into our end-to-end optimization process
for constraining global mesh registration and hand pose estimation. The
experiments show that our method, which takes a single binary mask as the
input, can achieve comparable prediction accuracy on both unaligned and aligned
settings as state-of-the-art methods that require RGB or depth inputs.
Related papers
- MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild [11.39213280304101]
MaskHand is a novel generative masked model for hand mesh recovery.
It synthesizes plausible 3D hand meshes by learning and sampling from the probabilistic distribution of the ambiguous 2D-to-3D mapping process.
It achieves state-of-the-art accuracy, robustness, and realism in 3D hand mesh reconstruction.
arXiv Detail & Related papers (2024-12-18T00:10:00Z) - Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information.
We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Deep-MDS Framework for Recovering the 3D Shape of 2D Landmarks from a
Single Image [8.368476827165114]
This paper proposes a framework to recover the 3D shape of 2D landmarks on a human face, in a single input image.
A deep neural network learns the pairwise dissimilarity among 2D landmarks, used by NMDS approach.
arXiv Detail & Related papers (2022-10-27T06:20:10Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.