Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
- URL: http://arxiv.org/abs/2503.19307v1
- Date: Tue, 25 Mar 2025 03:13:23 GMT
- Title: Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
- Authors: Zhuoran Zhao, Linlin Yang, Pengzhan Sun, Pan Hui, Angela Yao,
- Abstract summary: This paper presents the first systematic study of the synthetic-to-real gap of 3D hand pose estimation.<n>To facilitate our analysis, we propose a data synthesis pipeline to synthesize high-quality data.<n>We demonstrate that synthetic hand data can achieve the same level of accuracy as real data when integrating our identified components.
- Score: 41.382984217586504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent synthetic 3D human datasets for the face, body, and hands have pushed the limits on photorealism. Face recognition and body pose estimation have achieved state-of-the-art performance using synthetic training data alone, but for the hand, there is still a large synthetic-to-real gap. This paper presents the first systematic study of the synthetic-to-real gap of 3D hand pose estimation. We analyze the gap and identify key components such as the forearm, image frequency statistics, hand pose, and object occlusions. To facilitate our analysis, we propose a data synthesis pipeline to synthesize high-quality data. We demonstrate that synthetic hand data can achieve the same level of accuracy as real data when integrating our identified components, paving the path to use synthetic data alone for hand pose estimation. Code and data are available at: https://github.com/delaprada/HandSynthesis.git.
Related papers
- Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.<n>Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.<n>Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z) - G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - Data-Agnostic Face Image Synthesis Detection Using Bayesian CNNs [23.943447945946705]
We propose a data-agnostic solution to detect the face image synthesis process.
Our solution is based on an anomaly detection framework that requires only real data to learn the inference process.
arXiv Detail & Related papers (2024-01-08T21:23:23Z) - Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? [12.987587227876565]
We investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection.
By leveraging only 10% of real labeled data, we achieve improvements in Overall AP compared to baselines trained exclusively on real data.
arXiv Detail & Related papers (2023-12-05T11:29:00Z) - Training Robust Deep Physiological Measurement Models with Synthetic
Video-based Data [11.31971398273479]
We propose measures to add real-world noise to synthetic physiological signals and corresponding facial videos.
Our results show that we were able to reduce the average MAE from 6.9 to 2.0.
arXiv Detail & Related papers (2023-11-09T13:55:45Z) - RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose
Estimation [19.840282327688776]
We present a large-scale synthetic dataset RenderIH for interacting hands with accurate pose annotations.
The dataset contains 1M photo-realistic images with varied backgrounds, perspectives, and hand textures.
For better pose estimation accuracy, we introduce a transformer-based pose estimation network, TransHand.
arXiv Detail & Related papers (2023-09-17T15:30:58Z) - ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real
Novel View Synthesis via Contrastive Learning [102.46382882098847]
We first investigate the effects of synthetic data in synthetic-to-real novel view synthesis.
We propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints.
Our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2023-03-20T12:06:14Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - Fake It Till You Make It: Face analysis in the wild using synthetic data
alone [9.081019005437309]
We show that it is possible to perform face-related computer vision in the wild using synthetic data alone.
We describe how to combine a procedurally-generated 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism.
arXiv Detail & Related papers (2021-09-30T13:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.