Efficient Domain Adaptation via Generative Prior for 3D Infant Pose
Estimation
- URL: http://arxiv.org/abs/2311.12043v1
- Date: Fri, 17 Nov 2023 20:49:37 GMT
- Title: Efficient Domain Adaptation via Generative Prior for 3D Infant Pose
Estimation
- Authors: Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li,
Jenq-Neng Hwang
- Abstract summary: 3D human pose estimation has gained impressive development in recent years, but only a few works focus on infants, that have different bone lengths and also have limited data.
Here, we show that our model attains state-of-the-art MPJPE performance of 43.6 mm on the SyRIP dataset and 21.2 mm on the MINI-RGBD dataset.
We also prove that our method, ZeDO-i, could attain efficient domain adaptation, even if only a small number of data is given.
- Score: 29.037799937729687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although 3D human pose estimation has gained impressive development in recent
years, only a few works focus on infants, that have different bone lengths and
also have limited data. Directly applying adult pose estimation models
typically achieves low performance in the infant domain and suffers from
out-of-distribution issues. Moreover, the limitation of infant pose data
collection also heavily constrains the efficiency of learning-based models to
lift 2D poses to 3D. To deal with the issues of small datasets, domain
adaptation and data augmentation are commonly used techniques. Following this
paradigm, we take advantage of an optimization-based method that utilizes
generative priors to predict 3D infant keypoints from 2D keypoints without the
need of large training data. We further apply a guided diffusion model to
domain adapt 3D adult pose to infant pose to supplement small datasets.
Besides, we also prove that our method, ZeDO-i, could attain efficient domain
adaptation, even if only a small number of data is given. Quantitatively, we
claim that our model attains state-of-the-art MPJPE performance of 43.6 mm on
the SyRIP dataset and 21.2 mm on the MINI-RGBD dataset.
Related papers
- FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Pretrained Deep 2.5D Models for Efficient Predictive Modeling from
Retinal OCT [7.8641166297532035]
3D deep learning models play a crucial role in building powerful predictive models of disease progression.
In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers.
We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period.
arXiv Detail & Related papers (2023-07-25T23:46:48Z) - Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks.
These 2D models have been surpassed by 3D models on 3D computer vision benchmarks.
We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z) - Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery [32.73513554145019]
Domain Adaptive 3D Pose Augmentation (DAPA) is a data augmentation method that enhances the model's generalization ability in in-the-wild scenarios.
We show quantitatively that finetuning with DAPA effectively improves results on benchmarks 3DPW and AGORA.
arXiv Detail & Related papers (2022-06-21T15:02:31Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D
Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance.
We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces.
AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z) - LiftFormer: 3D Human Pose Estimation using attention models [0.0]
We propose the usage of models to obtain more accurate 3D predictions by leveraging attention mechanisms on ordered sequences human poses in videos.
Our method consistently outperforms the previous best results from the literature when using both 2D keypoint predictors by 0.3 mm (44.8 MPJPE, 0.7% improvement) and ground truth inputs by 2mm (MPJPE: 31.9, 8.4% improvement) on Human3.6M.
Our 3D lifting model's accuracy exceeds that of other end-to-end or SMPL approaches and is comparable to many multi-view methods.
arXiv Detail & Related papers (2020-09-01T11:05:45Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.