SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View
Adaptation
- URL: http://arxiv.org/abs/2402.01389v1
- Date: Fri, 2 Feb 2024 13:14:20 GMT
- Title: SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View
Adaptation
- Authors: Yinqiao Wang, Hao Xu, Pheng-Ann Heng, Chi-Wing Fu
- Abstract summary: Estimating 3D hand mesh from RGB images is one of the most challenging problems.
Existing attempts towards this task often fail when the occlusion dominates the image space.
We propose SiMA-Hand, aiming to boost the mesh reconstruction performance by Single-to-Multi-view Adaptation.
- Score: 90.59734612754222
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Estimating 3D hand mesh from RGB images is a longstanding track, in which
occlusion is one of the most challenging problems. Existing attempts towards
this task often fail when the occlusion dominates the image space. In this
paper, we propose SiMA-Hand, aiming to boost the mesh reconstruction
performance by Single-to-Multi-view Adaptation. First, we design a multi-view
hand reconstructor to fuse information across multiple views by holistically
adopting feature fusion at image, joint, and vertex levels. Then, we introduce
a single-view hand reconstructor equipped with SiMA. Though taking only one
view as input at inference, the shape and orientation features in the
single-view reconstructor can be enriched by learning non-occluded knowledge
from the extra views at training, enhancing the reconstruction precision on the
occluded regions. We conduct experiments on the Dex-YCB and HanCo benchmarks
with challenging object- and self-caused occlusion cases, manifesting that
SiMA-Hand consistently achieves superior performance over the state of the
arts. Code will be released on https://github.com/JoyboyWang/SiMA-Hand Pytorch.
Related papers
- HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - 2L3: Lifting Imperfect Generated 2D Images into Accurate 3D [16.66666619143761]
Multi-view (MV) 3D reconstruction is a promising solution to fuse generated MV images into consistent 3D objects.
However, the generated images usually suffer from inconsistent lighting, misaligned geometry, and sparse views, leading to poor reconstruction quality.
We present a novel 3D reconstruction framework that leverages intrinsic decomposition guidance, transient-mono prior guidance, and view augmentation to cope with the three issues.
arXiv Detail & Related papers (2024-01-29T02:30:31Z) - MOHO: Learning Single-view Hand-held Object Reconstruction with
Multi-view Occlusion-Aware Supervision [75.38953287579616]
We present a novel framework to exploit Multi-view Occlusion-aware supervision from hand-object videos for Hand-held Object reconstruction.
We tackle two predominant challenges in such setting: hand-induced occlusion and object's self-occlusion.
Experiments on HO3D and DexYCB datasets demonstrate 2D-supervised MOHO gains superior results against 3D-supervised methods by a large margin.
arXiv Detail & Related papers (2023-10-18T03:57:06Z) - HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation [5.888156950854715]
We propose a novel self-supervised pre-training strategy for regressing 3D hand mesh parameters.
Our proposed approach, named HandMIM, achieves strong performance on various hand mesh estimation tasks.
arXiv Detail & Related papers (2023-07-29T19:46:06Z) - MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with
Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points.
Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction.
By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh
Reconstruction from a Single RGB Image [9.238322841389994]
We propose a single-stage pipeline for multi-hand reconstruction.
Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture.
Our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
arXiv Detail & Related papers (2022-04-18T03:57:14Z) - Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction [57.3636347704271]
3D hand-mesh reconstruction from RGB images facilitates many applications, including augmented reality (AR)
This paper presents a novel pipeline by decoupling the hand-mesh reconstruction task into three stages.
We can promote high-quality finger-level mesh-image alignment and drive the models together to deliver real-time predictions.
arXiv Detail & Related papers (2021-09-03T20:42:01Z) - Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware
Multi-view Geometry Consistency [40.56510679634943]
We propose a self-supervised training architecture by leveraging the multi-view geometry consistency.
We design three novel loss functions for multi-view consistency, including the pixel consistency loss, the depth consistency loss, and the facial landmark-based epipolar loss.
Our method is accurate and robust, especially under large variations of expressions, poses, and illumination conditions.
arXiv Detail & Related papers (2020-07-24T12:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.