DPoser: Diffusion Model as Robust 3D Human Pose Prior
- URL: http://arxiv.org/abs/2312.05541v2
- Date: Sat, 23 Mar 2024 04:54:21 GMT
- Title: DPoser: Diffusion Model as Robust 3D Human Pose Prior
- Authors: Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Yulun Zhang, Haoqian Wang,
- Abstract summary: We introduce DPoser, a robust and versatile human pose prior built upon diffusion models.
DPoser regards various pose-centric tasks as inverse problems and employs variational diffusion sampling for efficient solving.
Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5.4%, 17.2%, and 3.8% across human mesh recovery, pose completion, and motion denoising, respectively.
- Score: 51.75784816929666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work targets to construct a robust human pose prior. However, it remains a persistent challenge due to biomechanical constraints and diverse human movements. Traditional priors like VAEs and NDFs often exhibit shortcomings in realism and generalization, notably with unseen noisy poses. To address these issues, we introduce DPoser, a robust and versatile human pose prior built upon diffusion models. DPoser regards various pose-centric tasks as inverse problems and employs variational diffusion sampling for efficient solving. Accordingly, designed with optimization frameworks, DPoser seamlessly benefits human mesh recovery, pose generation, pose completion, and motion denoising tasks. Furthermore, due to the disparity between the articulated poses and structured images, we propose truncated timestep scheduling to enhance the effectiveness of DPoser. Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5.4%, 17.2%, and 3.8% across human mesh recovery, pose completion, and motion denoising, respectively. Comprehensive experiments demonstrate the superiority of DPoser over existing state-of-the-art pose priors across multiple tasks.
Related papers
- Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior [8.314155285516073]
MOPED is the first method to leverage a novel multi-modal conditional diffusion model as a prior for SMPL pose parameters.
Our method offers powerful unconditional pose generation with the ability to condition on multi-modal inputs such as images and text.
arXiv Detail & Related papers (2024-10-18T15:29:19Z) - Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence [47.16903508897047]
In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states.
We introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations.
In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.
arXiv Detail & Related papers (2024-03-28T06:05:14Z) - Multi-Human Mesh Recovery with Transformers [5.420974192779563]
We introduce a new model with a streamlined transformer-based design, featuring three critical design choices: multi-scale feature incorporation, focused attention mechanisms, and relative joint supervision.
Our proposed model demonstrates a significant performance improvement, surpassing state-of-the-art region-based and whole-image-based methods on various benchmarks involving multiple individuals.
arXiv Detail & Related papers (2024-02-26T18:28:05Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs.
We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input.
Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - Motion-DVAE: Unsupervised learning for fast human motion denoising [18.432026846779372]
We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.
Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches.
arXiv Detail & Related papers (2023-06-09T12:18:48Z) - Proactive Multi-Camera Collaboration For 3D Human Pose Estimation [16.628446718419344]
This paper presents a multi-agent reinforcement learning scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation.
Active camera approaches proactively control camera poses to find optimal viewpoints for 3D reconstruction.
We jointly train our model with multiple world dynamics learning tasks to better capture environment dynamics.
arXiv Detail & Related papers (2023-03-07T10:01:00Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.