DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models
- URL: http://arxiv.org/abs/2211.16487v1
- Date: Tue, 29 Nov 2022 18:55:13 GMT
- Title: DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models
- Authors: Karl Holmquist and Bastian Wandt
- Abstract summary: We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image.
We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
- Score: 5.908471365011943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditionally, monocular 3D human pose estimation employs a machine learning
model to predict the most likely 3D pose for a given input image. However, a
single image can be highly ambiguous and induces multiple plausible solutions
for the 2D-3D lifting step which results in overly confident 3D pose
predictors. To this end, we propose \emph{DiffPose}, a conditional diffusion
model, that predicts multiple hypotheses for a given input image. In comparison
to similar approaches, our diffusion model is straightforward and avoids
intensive hyperparameter tuning, complex network structures, mode collapse, and
unstable training. Moreover, we tackle a problem of the common two-step
approach that first estimates a distribution of 2D joint locations via
joint-wise heatmaps and consecutively approximates them based on first- or
second-moment statistics. Since such a simplification of the heatmaps removes
valid information about possibly correct, though labeled unlikely, joint
locations, we propose to represent the heatmaps as a set of 2D joint candidate
samples. To extract information about the original distribution from these
samples we introduce our \emph{embedding transformer} that conditions the
diffusion model. Experimentally, we show that DiffPose slightly improves upon
the state of the art for multi-hypothesis pose estimation for simple poses and
outperforms it by a large margin for highly ambiguous poses.
Related papers
- ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D
Human Pose Estimaiton [27.708016152889787]
Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses.
Most of the hypotheses generated deviate substantially from the true pose.
Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction.
We propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion.
arXiv Detail & Related papers (2024-01-10T04:07:50Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs.
We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input.
Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds
for Human Pose and Shape Distribution Estimation [27.14060158187953]
Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image.
We show that these approaches exhibit a trade-off between three key properties.
Our method, HuManiFlow, predicts simultaneously accurate, consistent and diverse distributions.
arXiv Detail & Related papers (2023-05-11T16:49:19Z) - Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis
Aggregation [64.874000550443]
A Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed.
The proposed JPMA assembles multiple hypotheses generated by D3DP into a single 3D pose for practical use.
Our method outperforms the state-of-the-art deterministic and probabilistic approaches by 1.5% and 8.9%, respectively.
arXiv Detail & Related papers (2023-03-21T04:00:47Z) - A generic diffusion-based approach for 3D human pose prediction in the
wild [68.00961210467479]
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging-temporal task.
We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise and propose a conditional diffusion model that denoises them and forecasts plausible poses.
We investigate our findings on four standard datasets and obtain significant improvements over the state-of-the-art.
arXiv Detail & Related papers (2022-10-11T17:59:54Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Weakly Supervised Generative Network for Multiple 3D Human Pose
Hypotheses [74.48263583706712]
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth.
We propose a weakly supervised deep generative network to address the inverse problem.
arXiv Detail & Related papers (2020-08-13T09:26:01Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.