Interacting Attention Graph for Single Image Two-Hand Reconstruction
- URL: http://arxiv.org/abs/2203.09364v2
- Date: Fri, 18 Mar 2022 06:55:19 GMT
- Title: Interacting Attention Graph for Single Image Two-Hand Reconstruction
- Authors: Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu,
Yebin Liu
- Abstract summary: We present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image.
Our model outperforms all existing two-hand reconstruction methods by a large margin on InterHand2.6M benchmark.
- Score: 32.342152070402236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph convolutional network (GCN) has achieved great success in single hand
reconstruction task, while interacting two-hand reconstruction by GCN remains
unexplored. In this paper, we present Interacting Attention Graph Hand
(IntagHand), the first graph convolution based network that reconstructs two
interacting hands from a single RGB image. To solve occlusion and interaction
challenges of two-hand reconstruction, we introduce two novel attention based
modules in each upsampling step of the original GCN. The first module is the
pyramid image feature attention (PIFA) module, which utilizes multiresolution
features to implicitly obtain vertex-to-image alignment. The second module is
the cross hand attention (CHA) module that encodes the coherence of interacting
hands by building dense cross-attention between two hand vertices. As a result,
our model outperforms all existing two-hand reconstruction methods by a large
margin on InterHand2.6M benchmark. Moreover, ablation studies verify the
effectiveness of both PIFA and CHA modules for improving the reconstruction
accuracy. Results on in-the-wild images and live video streams further
demonstrate the generalization ability of our network. Our code is available at
https://github.com/Dw1010/IntagHand.
Related papers
- Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba [48.45301469664908]
3D Hand reconstruction from a single RGB image is challenging due to the articulated motion, self-occlusion, and interaction with objects.
Existing SOTA methods employ attention-based transformers to learn the 3D hand pose and shape.
We propose a novel graph-guided Mamba framework, named Hamba, which bridges graph learning and state space modeling.
arXiv Detail & Related papers (2024-07-12T19:04:58Z) - Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering [11.228453237603834]
We present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details.
We also introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures.
Our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality.
arXiv Detail & Related papers (2024-07-08T07:28:24Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand
Reconstruction [19.82874341207336]
We propose to reconstruct meshes and estimate MANO parameters of two hands from a single RGB image simultaneously.
MMIB consists of one graph residual block to aggregate local information and two transformer encoders to model long-range dependencies.
Experiments on the InterHand2.6M benchmark demonstrate promising results over the state-of-the-art hand reconstruction methods.
arXiv Detail & Related papers (2023-03-28T04:06:02Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Im2Hands: Learning Attentive Implicit Representation of Interacting
Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands.
Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency.
We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z) - Decoupled Iterative Refinement Framework for Interacting Hands
Reconstruction from a Single RGB Image [30.24438569170251]
We propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction.
Our method outperforms all existing two-hand reconstruction methods by a large margin on the InterHand2.6M dataset.
arXiv Detail & Related papers (2023-02-05T15:46:57Z) - LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction [2.2481284426718533]
We propose a method called lightweight attention hand (LWA-HAND) to reconstruct hands in low flops from a single RGB image.
The resulting model achieves comparable performance on the InterHand2.6M benchmark in comparison with the state-of-the-art models.
arXiv Detail & Related papers (2022-08-21T06:25:56Z) - Vision GNN: An Image is Worth Graph of Nodes [49.3335689216822]
We propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.
Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes.
Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture.
arXiv Detail & Related papers (2022-06-01T07:01:04Z) - MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs [55.66953093401889]
Masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data.
Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training.
arXiv Detail & Related papers (2022-01-07T16:48:07Z) - Adaptive Graphical Model Network for 2D Handpose Estimation [19.592024471753025]
We propose a new architecture to tackle the task of 2D hand pose estimation from a monocular RGB image.
The Adaptive Graphical Model Network (AGMN) consists of two branches of deep convolutional neural networks for calculating unary and pairwise potential functions.
Our approach outperforms the state-of-the-art method used in 2D hand keypoints estimation by a notable margin on two public datasets.
arXiv Detail & Related papers (2019-09-18T04:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.