MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2111.12707v1
- Date: Wed, 24 Nov 2021 18:59:02 GMT
- Title: MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
- Authors: Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool
- Abstract summary: Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion.
We propose Multi-Hypothesis Transformer (MHFormer) that learns to represent multiple plausible pose hypotheses.
MHFormer achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP.
- Score: 88.73883883964048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating 3D human poses from monocular videos is a challenging task due to
depth ambiguity and self-occlusion. Most existing works attempt to solve both
issues by exploiting spatial and temporal relationships. However, those works
ignore the fact that it is an inverse problem where multiple feasible solutions
(i.e., hypotheses) exist. To relieve this limitation, we propose a
Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal
representations of multiple plausible pose hypotheses. In order to effectively
model multi-hypothesis dependencies and build strong relationships across
hypothesis features, the task is decomposed into three stages: (i) Generate
multiple initial hypothesis representations; (ii) Model self-hypothesis
communication, merge multiple hypotheses into a single converged representation
and then partition it into several diverged hypotheses; (iii) Learn
cross-hypothesis communication and aggregate the multi-hypothesis features to
synthesize the final 3D pose. Through the above processes, the final
representation is enhanced and the synthesized pose is much more accurate.
Extensive experiments show that MHFormer achieves state-of-the-art results on
two challenging datasets: Human3.6M and MPI-INF-3DHP. Without bells and
whistles, its performance surpasses the previous best result by a large margin
of 3% on Human3.6M. Code and models are available at
https://github.com/Vegetebird/MHFormer.
Related papers
- Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation [6.061037203796638]
Platypose is a framework that uses a diffusion model pretrained on 3D human motion sequences for zero-shot 3D pose sequence estimation.
Platypose achieves state-of-the-art calibration and competitive joint error when tested on static poses from Human3.6M, MPI-INF-3DHP and 3DPW.
arXiv Detail & Related papers (2024-03-10T10:30:34Z) - Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D
Human Pose Estimaiton [27.708016152889787]
Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses.
Most of the hypotheses generated deviate substantially from the true pose.
Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction.
We propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion.
arXiv Detail & Related papers (2024-01-10T04:07:50Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [71.2556016049579]
ManiPose is a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting.
By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses.
We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis
Aggregation [64.874000550443]
A Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed.
The proposed JPMA assembles multiple hypotheses generated by D3DP into a single 3D pose for practical use.
Our method outperforms the state-of-the-art deterministic and probabilistic approaches by 1.5% and 8.9%, respectively.
arXiv Detail & Related papers (2023-03-21T04:00:47Z) - DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image.
We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z) - Weakly Supervised Generative Network for Multiple 3D Human Pose
Hypotheses [74.48263583706712]
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth.
We propose a weakly supervised deep generative network to address the inverse problem.
arXiv Detail & Related papers (2020-08-13T09:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.