Related papers: Reconstructing Hands in 3D with Transformers

Reconstructing Hands in 3D with Transformers

URL: http://arxiv.org/abs/2312.05251v1
Date: Fri, 8 Dec 2023 18:59:07 GMT
Title: Reconstructing Hands in 3D with Transformers
Authors: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik
Abstract summary: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work.
Score: 64.15390309553892
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand reconstruction. For training data, we combine multiple datasets that contain 2D or 3D hand annotations. For the deep model, we use a large scale Vision Transformer architecture. Our final model consistently outperforms the previous baselines on popular 3D hand pose benchmarks. To further evaluate the effect of our design in non-controlled settings, we annotate existing in-the-wild datasets with 2D hand keypoint annotations. On this newly collected dataset of annotations, HInt, we demonstrate significant improvements over existing baselines. We make our code, data and models available on the project website: https://geopavlakos.github.io/hamer/.

Related papers

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild. The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model. Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z)
End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh Reconstruction from a Single RGB Image [9.238322841389994]
We propose a single-stage pipeline for multi-hand reconstruction. Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture. Our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
arXiv Detail & Related papers (2022-04-18T03:57:14Z)
Consistent 3D Hand Reconstruction in Video via self-supervised Learning [67.55449194046996]
We present a method for reconstructing accurate and consistent 3D hands from a monocular video. detected 2D hand keypoints and the image texture provide important cues about the geometry and texture of the 3D hand. We propose $rm S2HAND$, a self-supervised 3D hand reconstruction model.
arXiv Detail & Related papers (2022-01-24T09:44:11Z)
HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks [71.09275975580009]
HandVoxNet++ is a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology. We combine the advantages of both representations by aligning the hand surface to the voxelized hand shape either with a new neural Graph-Convolutions-based Mesh Registration (GCN-MeshReg) or classical segment-wise Non-Rigid Gravitational Approach (NRGA++) which
arXiv Detail & Related papers (2021-07-02T17:59:54Z)
Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z)
MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand Pose Estimation [32.12879364117658]
Estimating 3D hand poses from a single RGB image is challenging because depth ambiguity leads the problem ill-posed. We design a spin match algorithm that enables a rigid mesh model matching with any target mesh ground truth. We present a multi-view hand pose estimation approach to verify that training a hand pose estimator with our generated dataset greatly enhances the performance.
arXiv Detail & Related papers (2020-12-06T07:55:08Z)
BiHand: Recovering Hand Mesh with Multi-stage Bisected Hourglass Networks [37.65510556305611]
We introduce an end-to-end learnable model, BiHand, which consists of three cascaded stages, namely 2D seeding stage, 3D lifting stage, and mesh generation stage. At the output of BiHand, the full hand mesh will be recovered using the joint rotations and shape parameters predicted from the network. Our model can achieve superior accuracy in comparison with state-of-the-art methods, and can produce appealing 3D hand meshes in several severe conditions.
arXiv Detail & Related papers (2020-08-12T03:13:17Z)
Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data [77.34069717612493]
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
arXiv Detail & Related papers (2020-03-21T03:51:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.