CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by
Leveraging In-the-wild 2D Annotations
- URL: http://arxiv.org/abs/2301.02979v1
- Date: Sun, 8 Jan 2023 05:07:41 GMT
- Title: CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by
Leveraging In-the-wild 2D Annotations
- Authors: Cheng-Yen Yang, Jiajia Luo, Lu Xia, Yuyin Sun, Nan Qiao, Ke Zhang,
Zhongyu Jiang, Jenq-Neng Hwang
- Abstract summary: We present CameraPose, a weakly-supervised framework for 3D human pose estimation from a single image.
By adding a camera parameter branch, any in-the-wild 2D annotations can be fed into our pipeline to boost the training diversity.
We also introduce a refinement network module with confidence-guided loss to further improve the quality of noisy 2D keypoints extracted by 2D pose estimators.
- Score: 25.05308239278207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve the generalization of 3D human pose estimators, many existing deep
learning based models focus on adding different augmentations to training
poses. However, data augmentation techniques are limited to the "seen" pose
combinations and hard to infer poses with rare "unseen" joint positions. To
address this problem, we present CameraPose, a weakly-supervised framework for
3D human pose estimation from a single image, which can not only be applied on
2D-3D pose pairs but also on 2D alone annotations. By adding a camera parameter
branch, any in-the-wild 2D annotations can be fed into our pipeline to boost
the training diversity and the 3D poses can be implicitly learned by
reprojecting back to 2D. Moreover, CameraPose introduces a refinement network
module with confidence-guided loss to further improve the quality of noisy 2D
keypoints extracted by 2D pose estimators. Experimental results demonstrate
that the CameraPose brings in clear improvements on cross-scenario datasets.
Notably, it outperforms the baseline method by 3mm on the most challenging
dataset 3DPW. In addition, by combining our proposed refinement network module
with existing 3D pose estimators, their performance can be improved in
cross-scenario evaluation.
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency [0.493599216374976]
We propose a novel loss function, multiview consistency, to enable adding additional training data with only 2D supervision.
Our experiments demonstrate that two views offset by 90 degrees are enough to obtain good performance, with only marginal improvements by adding more views.
This research introduces new possibilities for domain adaptation in 3D pose estimation, providing a practical and cost-effective solution to customize models for specific applications.
arXiv Detail & Related papers (2023-11-21T08:21:55Z) - Weakly-supervised Pre-training for 3D Human Pose Estimation via
Perspective Knowledge [36.65402869749077]
We propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision.
We propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.
WSP achieves state-of-the-art results on two widely-used benchmarks.
arXiv Detail & Related papers (2022-11-22T03:35:15Z) - ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera
Elevation and Learning Normalizing Flows on 2D Poses [23.554957518485324]
We propose an unsupervised approach that learns to predict a 3D human pose from a single image.
We estimate the 3D pose that is most likely over random projections, with the likelihood estimated using normalizing flows on 2D poses.
We outperform the state-of-the-art unsupervised human pose estimation methods on the benchmark datasets Human3.6M and MPI-INF-3DHP in many metrics.
arXiv Detail & Related papers (2021-12-14T01:12:45Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - SVMA: A GAN-based model for Monocular 3D Human Pose Estimation [0.8379286663107844]
We present an unsupervised GAN-based model to recover 3D human pose from 2D joint locations extracted from a single image.
Considering the reprojection constraint, our model can estimate the camera so that we can reproject the estimated 3D pose to the original 2D pose.
Results on Human3.6M show that our method outperforms all the state-of-the-art methods, and results on MPI-INF-3DHP show that our method outperforms state-of-the-art by approximately 15.0%.
arXiv Detail & Related papers (2021-06-10T09:43:57Z) - PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose
Estimation [83.50127973254538]
Existing 3D human pose estimators suffer poor generalization performance to new datasets.
We present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity.
arXiv Detail & Related papers (2021-05-06T06:57:42Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.