HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images
- URL: http://arxiv.org/abs/2405.01066v3
- Date: Tue, 14 May 2024 11:47:26 GMT
- Title: HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images
- Authors: Zixun Jiao, Xihan Wang, Zhaoqiang Xia, Lianhe Shao, Quanli Gao,
- Abstract summary: We propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C)
In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field.
Our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.
- Score: 4.252549987351642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high computational efficiency, in this work, we propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C), which is the first time to incorporate state space model into the task of hand mesh reconstruction. In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field, extracts hand features in the spatial dimension, and enhances regional features of hands in the channel dimension. This helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets facing heavy occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.
Related papers
- WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild.
The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model.
Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z) - 3D Hand Mesh Recovery from Monocular RGB in Camera Space [3.0453197258042213]
This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks.
We utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks.
Our proposed model is comparable with state-of-the-art models.
arXiv Detail & Related papers (2024-05-12T05:36:37Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View
Adaptation [90.59734612754222]
Estimating 3D hand mesh from RGB images is one of the most challenging problems.
Existing attempts towards this task often fail when the occlusion dominates the image space.
We propose SiMA-Hand, aiming to boost the mesh reconstruction performance by Single-to-Multi-view Adaptation.
arXiv Detail & Related papers (2024-02-02T13:14:20Z) - HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image [41.580285338167315]
This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image.
We use the hand shape to constrain the possible relative configuration of the hand and object geometry.
We show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods.
arXiv Detail & Related papers (2023-09-14T17:42:08Z) - Decoupled Iterative Refinement Framework for Interacting Hands
Reconstruction from a Single RGB Image [30.24438569170251]
We propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction.
Our method outperforms all existing two-hand reconstruction methods by a large margin on the InterHand2.6M dataset.
arXiv Detail & Related papers (2023-02-05T15:46:57Z) - UV-Based 3D Hand-Object Reconstruction with Grasp Optimization [23.06364591130636]
We propose a novel framework for 3D hand shape reconstruction and hand-object grasp optimization from a single RGB image.
Instead of approximating the contact regions with sparse points, we propose a dense representation in the form of a UV coordinate map.
Our pipeline increases hand shape reconstruction accuracy and produces a vibrant hand texture.
arXiv Detail & Related papers (2022-11-24T05:59:23Z) - Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction [57.3636347704271]
3D hand-mesh reconstruction from RGB images facilitates many applications, including augmented reality (AR)
This paper presents a novel pipeline by decoupling the hand-mesh reconstruction task into three stages.
We can promote high-quality finger-level mesh-image alignment and drive the models together to deliver real-time predictions.
arXiv Detail & Related papers (2021-09-03T20:42:01Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild [59.158592526006814]
We train our network by gathering a large-scale dataset of hand action in YouTube videos.
Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark.
arXiv Detail & Related papers (2020-04-04T14:35:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.