Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery
- URL: http://arxiv.org/abs/2206.10457v1
- Date: Tue, 21 Jun 2022 15:02:31 GMT
- Title: Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery
- Authors: Zhenzhen Weng, Kuan-Chieh Wang, Angjoo Kanazawa, Serena Yeung
- Abstract summary: Domain Adaptive 3D Pose Augmentation (DAPA) is a data augmentation method that enhances the model's generalization ability in in-the-wild scenarios.
We show quantitatively that finetuning with DAPA effectively improves results on benchmarks 3DPW and AGORA.
- Score: 32.73513554145019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to perceive 3D human bodies from a single image has a multitude
of applications ranging from entertainment and robotics to neuroscience and
healthcare. A fundamental challenge in human mesh recovery is in collecting the
ground truth 3D mesh targets required for training, which requires burdensome
motion capturing systems and is often limited to indoor laboratories. As a
result, while progress is made on benchmark datasets collected in these
restrictive settings, models fail to generalize to real-world ``in-the-wild''
scenarios due to distribution shifts. We propose Domain Adaptive 3D Pose
Augmentation (DAPA), a data augmentation method that enhances the model's
generalization ability in in-the-wild scenarios. DAPA combines the strength of
methods based on synthetic datasets by getting direct supervision from the
synthesized meshes, and domain adaptation methods by using ground truth 2D
keypoints from the target dataset. We show quantitatively that finetuning with
DAPA effectively improves results on benchmarks 3DPW and AGORA. We further
demonstrate the utility of DAPA on a challenging dataset curated from videos of
real-world parent-child interaction.
Related papers
- Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance [11.090775523892074]
We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data.
Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues.
Our method achieves up to 85% of the fully-supervised performance using only 10% labeled data.
arXiv Detail & Related papers (2024-08-21T12:13:18Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection [50.448520056844885]
We propose a novel framework for syn-to-real unsupervised domain adaptation in indoor 3D object detection.
Our adaptation results from synthetic dataset 3D-FRONT to real-world datasets ScanNetV2 and SUN RGB-D demonstrate remarkable mAP25 improvements of 9.7% and 9.1% over Source-Only baselines.
arXiv Detail & Related papers (2024-06-17T08:18:41Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Source-Free and Image-Only Unsupervised Domain Adaptation for Category
Level Object Pose Estimation [18.011044932979143]
3DUDA is a method capable of adapting to a nuisance-ridden target domain without 3D or depth data.
We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations.
We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions.
arXiv Detail & Related papers (2024-01-19T17:48:05Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D
Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance.
We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces.
AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.