Parallel mesh reconstruction streams for pose estimation of interacting
hands
- URL: http://arxiv.org/abs/2104.12123v1
- Date: Sun, 25 Apr 2021 10:14:15 GMT
- Title: Parallel mesh reconstruction streams for pose estimation of interacting
hands
- Authors: Uri Wollner and Guy Ben-Yosef
- Abstract summary: We present a new multi-stream 3D mesh reconstruction network (MSMR-Net) for hand pose estimation from a single RGB image.
Our model consists of an image encoder followed by a mesh-convolution decoder composed of connected graph convolution layers.
- Score: 2.0305676256390934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new multi-stream 3D mesh reconstruction network (MSMR-Net) for
hand pose estimation from a single RGB image. Our model consists of an image
encoder followed by a mesh-convolution decoder composed of connected graph
convolution layers. In contrast to previous models that form a single mesh
decoding path, our decoder network incorporates multiple cross-resolution
trajectories that are executed in parallel. Thus, global and local information
are shared to form rich decoding representations at minor additional parameter
cost compared to the single trajectory network. We demonstrate the
effectiveness of our method in hand-hand and hand-object interaction scenarios
at various levels of interaction. To evaluate the former scenario, we propose a
method to generate RGB images of closely interacting hands. Moreoever, we
suggest a metric to quantify the degree of interaction and show that close hand
interactions are particularly challenging. Experimental results show that the
MSMR-Net outperforms existing algorithms on the hand-object FreiHAND dataset as
well as on our own hand-hand dataset.
Related papers
- Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering [11.228453237603834]
We present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details.
We also introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures.
Our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality.
arXiv Detail & Related papers (2024-07-08T07:28:24Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand
Reconstruction [19.82874341207336]
We propose to reconstruct meshes and estimate MANO parameters of two hands from a single RGB image simultaneously.
MMIB consists of one graph residual block to aggregate local information and two transformer encoders to model long-range dependencies.
Experiments on the InterHand2.6M benchmark demonstrate promising results over the state-of-the-art hand reconstruction methods.
arXiv Detail & Related papers (2023-03-28T04:06:02Z) - A Model-data-driven Network Embedding Multidimensional Features for
Tomographic SAR Imaging [5.489791364472879]
We propose a new model-data-driven network to achieve tomoSAR imaging based on multi-dimensional features.
We add two 2D processing modules, both convolutional encoder-decoder structures, to enhance multi-dimensional features of the imaging scene effectively.
Compared with the conventional CS-based FISTA method and DL-based gamma-Net method, the result of our proposed method has better performance on completeness while having decent imaging accuracy.
arXiv Detail & Related papers (2022-11-28T02:01:43Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Im2Mesh GAN: Accurate 3D Hand Mesh Recovery from a Single RGB Image [31.371190180801452]
We show that the hand mesh can be learned directly from the input image.
We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training.
arXiv Detail & Related papers (2021-01-27T07:38:01Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild [59.158592526006814]
We train our network by gathering a large-scale dataset of hand action in YouTube videos.
Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark.
arXiv Detail & Related papers (2020-04-04T14:35:37Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.