Consistent 3D Hand Reconstruction in Video via self-supervised Learning
- URL: http://arxiv.org/abs/2201.09548v2
- Date: Mon, 20 Mar 2023 04:28:10 GMT
- Title: Consistent 3D Hand Reconstruction in Video via self-supervised Learning
- Authors: Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng
Yang, and Junsong Yuan
- Abstract summary: We present a method for reconstructing accurate and consistent 3D hands from a monocular video.
detected 2D hand keypoints and the image texture provide important cues about the geometry and texture of the 3D hand.
We propose $rm S2HAND$, a self-supervised 3D hand reconstruction model.
- Score: 67.55449194046996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method for reconstructing accurate and consistent 3D hands from
a monocular video. We observe that detected 2D hand keypoints and the image
texture provide important cues about the geometry and texture of the 3D hand,
which can reduce or even eliminate the requirement on 3D hand annotation. Thus
we propose ${\rm {S}^{2}HAND}$, a self-supervised 3D hand reconstruction model,
that can jointly estimate pose, shape, texture, and the camera viewpoint from a
single RGB input through the supervision of easily accessible 2D detected
keypoints. We leverage the continuous hand motion information contained in the
unlabeled video data and propose ${\rm {S}^{2}HAND(V)}$, which uses a set of
weights shared ${\rm {S}^{2}HAND}$ to process each frame and exploits
additional motion, texture, and shape consistency constrains to promote more
accurate hand poses and more consistent shapes and textures. Experiments on
benchmark datasets demonstrate that our self-supervised approach produces
comparable hand reconstruction performance compared with the recent
full-supervised methods in single-frame as input setup, and notably improves
the reconstruction accuracy and consistency when using video training data.
Related papers
- Reconstructing Hands in 3D with Transformers [64.15390309553892]
We present an approach that can reconstruct hands in 3D from monocular input.
Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work.
arXiv Detail & Related papers (2023-12-08T18:59:07Z) - SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes.
We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence.
This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z) - HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image [41.580285338167315]
This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image.
We use the hand shape to constrain the possible relative configuration of the hand and object geometry.
We show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods.
arXiv Detail & Related papers (2023-09-14T17:42:08Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z) - Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data [77.34069717612493]
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps.
This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data.
It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass.
arXiv Detail & Related papers (2020-03-21T03:51:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.