Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2307.03833v3
- Date: Tue, 24 Oct 2023 20:46:24 GMT
- Title: Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
- Authors: Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang,
Jenq-Neng Hwang
- Abstract summary: Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods.
We propose textbfZero-shot textbfDiffusion-based textbfOptimization (textbfZeDO) pipeline for 3D HPE.
Our multi-hypothesis textittextbfZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$
- Score: 29.037799937729687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based methods have dominated the 3D human pose estimation (HPE)
tasks with significantly better performance in most benchmarks than traditional
optimization-based methods. Nonetheless, 3D HPE in the wild is still the
biggest challenge for learning-based models, whether with 2D-3D lifting,
image-to-3D, or diffusion-based methods, since the trained networks implicitly
learn camera intrinsic parameters and domain-based 3D human pose distributions
and estimate poses by statistical average. On the other hand, the
optimization-based methods estimate results case-by-case, which can predict
more diverse and sophisticated human poses in the wild. By combining the
advantages of optimization-based and learning-based methods, we propose the
\textbf{Ze}ro-shot \textbf{D}iffusion-based \textbf{O}ptimization
(\textbf{ZeDO}) pipeline for 3D HPE to solve the problem of cross-domain and
in-the-wild 3D HPE. Our multi-hypothesis \textit{\textbf{ZeDO}} achieves
state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$mm,
without training with any 2D-3D or image-3D pairs. Moreover, our
single-hypothesis \textit{\textbf{ZeDO}} achieves SOTA performance on 3DPW
dataset with PA-MPJPE $40.3$mm on cross-dataset evaluation, which even
outperforms learning-based methods trained on 3DPW.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [68.75387874066647]
We propose an Uncertainty-Aware testing-time optimization framework for 3D human pose estimation.
Our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z) - 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose
Estimation [28.24765523800196]
We propose 3D-aware Neural Body Fitting (3DNBF) for 3D human pose estimation.
In particular, we propose a generative model of deep features based on a volumetric human representation with Gaussian ellipsoidal kernels emitting 3D pose-dependent feature vectors.
The neural features are trained with contrastive learning to become 3D-aware and hence to overcome the 2D-3D ambiguity.
arXiv Detail & Related papers (2023-08-19T22:41:00Z) - Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis
Aggregation [64.874000550443]
A Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed.
The proposed JPMA assembles multiple hypotheses generated by D3DP into a single 3D pose for practical use.
Our method outperforms the state-of-the-art deterministic and probabilistic approaches by 1.5% and 8.9%, respectively.
arXiv Detail & Related papers (2023-03-21T04:00:47Z) - AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by
Learnable Motion Generation [24.009674750548303]
Testing a pre-trained 3D pose estimator on a new dataset results in a major performance drop.
We propose AdaptPose, an end-to-end framework that generates synthetic 3D human motions from a source dataset.
Our method outperforms previous work in cross-dataset evaluations by 14% and previous semi-supervised learning methods that use partial 3D annotations by 16%.
arXiv Detail & Related papers (2021-12-22T00:27:52Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.