RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose
Estimation
- URL: http://arxiv.org/abs/2309.09301v3
- Date: Wed, 27 Sep 2023 16:02:13 GMT
- Title: RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose
Estimation
- Authors: Lijun Li, Linrui Tian, Xindi Zhang, Qi Wang, Bang Zhang, Mengyuan Liu,
and Chen Chen
- Abstract summary: We present a large-scale synthetic dataset RenderIH for interacting hands with accurate pose annotations.
The dataset contains 1M photo-realistic images with varied backgrounds, perspectives, and hand textures.
For better pose estimation accuracy, we introduce a transformer-based pose estimation network, TransHand.
- Score: 19.840282327688776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The current interacting hand (IH) datasets are relatively simplistic in terms
of background and texture, with hand joints being annotated by a machine
annotator, which may result in inaccuracies, and the diversity of pose
distribution is limited. However, the variability of background, pose
distribution, and texture can greatly influence the generalization ability.
Therefore, we present a large-scale synthetic dataset RenderIH for interacting
hands with accurate and diverse pose annotations. The dataset contains 1M
photo-realistic images with varied backgrounds, perspectives, and hand
textures. To generate natural and diverse interacting poses, we propose a new
pose optimization algorithm. Additionally, for better pose estimation accuracy,
we introduce a transformer-based pose estimation network, TransHand, to
leverage the correlation between interacting hands and verify the effectiveness
of RenderIH in improving results. Our dataset is model-agnostic and can improve
more accuracy of any hand pose estimation method in comparison to other real or
synthetic datasets. Experiments have shown that pretraining on our synthetic
data can significantly decrease the error from 6.76mm to 5.79mm, and our
Transhand surpasses contemporary methods. Our dataset and code are available at
https://github.com/adwardlee/RenderIH.
Related papers
- Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation [41.382984217586504]
This paper presents the first systematic study of the synthetic-to-real gap of 3D hand pose estimation.
To facilitate our analysis, we propose a data synthesis pipeline to synthesize high-quality data.
We demonstrate that synthetic hand data can achieve the same level of accuracy as real data when integrating our identified components.
arXiv Detail & Related papers (2025-03-25T03:13:23Z) - JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting [3.1143479095236892]
Jointly 3D Gaussian Hand (JGHand) is a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation.
We show that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2025-01-31T12:33:24Z) - HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation [29.766317710266765]
We propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction.
We use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used.
We extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction.
arXiv Detail & Related papers (2025-01-06T08:48:17Z) - Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.
Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.
Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z) - DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors [4.697267141773321]
We present DreamHOI, a novel method for zero-shot synthesis of human-object interactions (HOIs)
We leverage text-to-image diffusion models trained on billions of image-caption pairs to generate realistic HOIs.
We validate our approach through extensive experiments, demonstrating its effectiveness in generating realistic HOIs.
arXiv Detail & Related papers (2024-09-12T17:59:49Z) - DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image [98.29284902879652]
We present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image.
It features disentangling the regression of local deformation fields and global mesh locations into two network branches.
It achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility.
arXiv Detail & Related papers (2024-06-26T00:08:29Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation [3.126179109712709]
We propose a mesh represented recycle learning strategy for 3D hand pose and mesh estimation.
To be specific, a hand pose and mesh estimation model first predicts parametric 3D hand annotations.
Second, synthetic hand images are generated with self-estimated hand mesh representations.
Third, the synthetic hand images are fed into the same model again.
arXiv Detail & Related papers (2023-10-18T09:50:09Z) - Denoising Diffusion for 3D Hand Pose Estimation from Images [38.20064386142944]
This paper addresses the problem of 3D hand pose estimation from monocular images or sequences.
We present a novel end-to-end framework for 3D hand regression that employs diffusion models that have shown excellent ability to capture the distribution of data for generative purposes.
The proposed model provides state-of-the-art performance when lifting a 2D single-hand image to 3D.
arXiv Detail & Related papers (2023-08-18T12:57:22Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - Methodology for Building Synthetic Datasets with Virtual Humans [1.5556923898855324]
Large datasets can be used for improved, targeted training of deep neural networks.
In particular, we make use of a 3D morphable face model for the rendering of multiple 2D images across a dataset of 100 synthetic identities.
arXiv Detail & Related papers (2020-06-21T10:29:36Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.