Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior
- URL: http://arxiv.org/abs/2410.14540v1
- Date: Fri, 18 Oct 2024 15:29:19 GMT
- Title: Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior
- Authors: Calvin-Khang Ta, Arindam Dutta, Rohit Kundu, Rohit Lal, Hannah Dela Cruz, Dripta S. Raychaudhuri, Amit Roy-Chowdhury,
- Abstract summary: MOPED is the first method to leverage a novel multi-modal conditional diffusion model as a prior for SMPL pose parameters.
Our method offers powerful unconditional pose generation with the ability to condition on multi-modal inputs such as images and text.
- Score: 8.314155285516073
- License:
- Abstract: The Skinned Multi-Person Linear (SMPL) model plays a crucial role in 3D human pose estimation, providing a streamlined yet effective representation of the human body. However, ensuring the validity of SMPL configurations during tasks such as human mesh regression remains a significant challenge , highlighting the necessity for a robust human pose prior capable of discerning realistic human poses. To address this, we introduce MOPED: \underline{M}ulti-m\underline{O}dal \underline{P}os\underline{E} \underline{D}iffuser. MOPED is the first method to leverage a novel multi-modal conditional diffusion model as a prior for SMPL pose parameters. Our method offers powerful unconditional pose generation with the ability to condition on multi-modal inputs such as images and text. This capability enhances the applicability of our approach by incorporating additional context often overlooked in traditional pose priors. Extensive experiments across three distinct tasks-pose estimation, pose denoising, and pose completion-demonstrate that our multi-modal diffusion model-based prior significantly outperforms existing methods. These results indicate that our model captures a broader spectrum of plausible human poses.
Related papers
- $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation [17.281031933210762]
We introduce the Discrete Diffusion Pose ($textDi2textPose$), a novel framework designed for occluded 3D human pose estimation.
$textDi2textPose$ employs a two-stage process: it first converts 3D poses into a discrete representation through a emphpose quantization step.
This methodological innovation restrictively confines the search space towards physically viable configurations.
arXiv Detail & Related papers (2024-05-27T10:01:36Z) - Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence [47.16903508897047]
In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states.
We introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations.
In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.
arXiv Detail & Related papers (2024-03-28T06:05:14Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs.
We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input.
Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - DPoser: Diffusion Model as Robust 3D Human Pose Prior [51.75784816929666]
We introduce DPoser, a robust and versatile human pose prior built upon diffusion models.
DPoser regards various pose-centric tasks as inverse problems and employs variational diffusion sampling for efficient solving.
Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5.4%, 17.2%, and 3.8% across human mesh recovery, pose completion, and motion denoising, respectively.
arXiv Detail & Related papers (2023-12-09T11:18:45Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields [47.62275563070933]
We present a continuous model for plausible human poses based on neural distance fields (NDFs)
Pose-NDF learns a manifold of plausible poses as the zero level set of a neural implicit function.
It can be used to generate more diverse poses by random sampling and projection than VAE-based methods.
arXiv Detail & Related papers (2022-07-27T21:46:47Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.