LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion
- URL: http://arxiv.org/abs/2509.25739v1
- Date: Tue, 30 Sep 2025 03:50:56 GMT
- Title: LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion
- Authors: Donghwan Kim, Tae-Kyun Kim,
- Abstract summary: We tackle the problem of Human Mesh Recovery from a single RGB image.<n>While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output.<n>We propose a novel approach that models well-aligned distribution to 2D observations.
- Score: 29.608043710963162
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We tackle the problem of Human Mesh Recovery (HMR) from a single RGB image, formulating it as an image-conditioned human pose and shape generation. While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output. Probabilistic methods attempt to address this by generating multiple plausible outputs to model the ambiguity. However, these methods often exhibit a trade-off between accuracy and sample diversity, and their single predictions are not competitive with state-of-the-art deterministic models. To overcome these limitations, we propose a novel approach that models well-aligned distribution to 2D observations. In particular, we introduce $SO(3)$ diffusion model, which generates the distribution of pose parameters represented as 3D rotations unconditional and conditional to image observations via conditioning dropout. Our model learns the hierarchical structure of human body joints using the transformer. Instead of using transformer as a denoising model, the time-independent transformer extracts latent vectors for the joints and a small MLP-based denoising model learns the per-joint distribution conditioned on the latent vector. We experimentally demonstrate and analyze that our model predicts accurate pose probability distribution effectively.
Related papers
- VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery [75.62565146049015]
We introduce a dual-memory augmented HMR critique agent with self-reflection to produce context-aware quality scores for predicted meshes.<n>These scores distill fine-grained cues about 3D human motion structure, physical feasibility, and alignment with the input image.<n>We propose a group preference alignment framework for finetuning diffusion-based HMR models.
arXiv Detail & Related papers (2026-02-22T13:19:06Z) - Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D
Human Pose Estimaiton [27.708016152889787]
Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses.
Most of the hypotheses generated deviate substantially from the true pose.
Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction.
We propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion.
arXiv Detail & Related papers (2024-01-10T04:07:50Z) - ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [71.2556016049579]
ManiPose is a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting.<n>By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses.<n>We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - Generative Approach for Probabilistic Human Mesh Recovery using
Diffusion Models [33.2565018922113]
This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image.
We propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)"
arXiv Detail & Related papers (2023-08-05T22:23:04Z) - Diffusion with Forward Models: Solving Stochastic Inverse Problems
Without Direct Supervision [76.32860119056964]
We propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed.
We demonstrate the effectiveness of our method on three challenging computer vision tasks.
arXiv Detail & Related papers (2023-06-20T17:53:00Z) - HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds
for Human Pose and Shape Distribution Estimation [27.14060158187953]
Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image.
We show that these approaches exhibit a trade-off between three key properties.
Our method, HuManiFlow, predicts simultaneously accurate, consistent and diverse distributions.
arXiv Detail & Related papers (2023-05-11T16:49:19Z) - DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image.
We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z) - Weakly Supervised Generative Network for Multiple 3D Human Pose
Hypotheses [74.48263583706712]
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth.
We propose a weakly supervised deep generative network to address the inverse problem.
arXiv Detail & Related papers (2020-08-13T09:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.