MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand
Reconstruction
- URL: http://arxiv.org/abs/2303.15718v2
- Date: Mon, 17 Apr 2023 02:44:54 GMT
- Title: MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand
Reconstruction
- Authors: Congyi Wang, Feida Zhu, Shilei Wen
- Abstract summary: We propose to reconstruct meshes and estimate MANO parameters of two hands from a single RGB image simultaneously.
MMIB consists of one graph residual block to aggregate local information and two transformer encoders to model long-range dependencies.
Experiments on the InterHand2.6M benchmark demonstrate promising results over the state-of-the-art hand reconstruction methods.
- Score: 19.82874341207336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing methods proposed for hand reconstruction tasks usually parameterize
a generic 3D hand model or predict hand mesh positions directly. The parametric
representations consisting of hand shapes and rotational poses are more stable,
while the non-parametric methods can predict more accurate mesh positions. In
this paper, we propose to reconstruct meshes and estimate MANO parameters of
two hands from a single RGB image simultaneously to utilize the merits of two
kinds of hand representations. To fulfill this target, we propose novel
Mesh-Mano interaction blocks (MMIBs), which take mesh vertices positions and
MANO parameters as two kinds of query tokens. MMIB consists of one graph
residual block to aggregate local information and two transformer encoders to
model long-range dependencies. The transformer encoders are equipped with
different asymmetric attention masks to model the intra-hand and inter-hand
attention, respectively. Moreover, we introduce the mesh alignment refinement
module to further enhance the mesh-image alignment. Extensive experiments on
the InterHand2.6M benchmark demonstrate promising results over the
state-of-the-art hand reconstruction methods.
Related papers
- Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand
Shape Reconstruction [62.96478903239799]
Direct mesh fitting for 3D hand shape reconstruction is highly accurate.
However, the reconstructed meshes are prone to artifacts and do not appear as plausible hand shapes.
We introduce a novel weakly-supervised hand shape estimation framework that integrates non-parametric mesh fitting with MANO model in an end-to-end fashion.
arXiv Detail & Related papers (2023-05-01T03:38:01Z) - Im2Hands: Learning Attentive Implicit Representation of Interacting
Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands.
Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency.
We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z) - End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh
Reconstruction from a Single RGB Image [9.238322841389994]
We propose a single-stage pipeline for multi-hand reconstruction.
Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture.
Our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
arXiv Detail & Related papers (2022-04-18T03:57:14Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z) - Skeleton-aware multi-scale heatmap regression for 2D hand pose
estimation [1.0152838128195467]
We propose a new deep learning-based framework that consists of two main modules.
The former presents a segmentation-based approach to detect the hand skeleton and localize the hand bounding box.
The second module regresses the 2D joint locations through a multi-scale heatmap regression approach.
arXiv Detail & Related papers (2021-05-23T10:23:51Z) - Parallel mesh reconstruction streams for pose estimation of interacting
hands [2.0305676256390934]
We present a new multi-stream 3D mesh reconstruction network (MSMR-Net) for hand pose estimation from a single RGB image.
Our model consists of an image encoder followed by a mesh-convolution decoder composed of connected graph convolution layers.
arXiv Detail & Related papers (2021-04-25T10:14:15Z) - Im2Mesh GAN: Accurate 3D Hand Mesh Recovery from a Single RGB Image [31.371190180801452]
We show that the hand mesh can be learned directly from the input image.
We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training.
arXiv Detail & Related papers (2021-01-27T07:38:01Z) - End-to-End Human Pose and Mesh Reconstruction with Transformers [17.75480888764098]
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image.
METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands.
We demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset.
arXiv Detail & Related papers (2020-12-17T17:17:29Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.