Related papers: LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

URL: http://arxiv.org/abs/2509.25739v1
Date: Tue, 30 Sep 2025 03:50:56 GMT
Title: LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion
Authors: Donghwan Kim, Tae-Kyun Kim,
Abstract summary: We tackle the problem of Human Mesh Recovery from a single RGB image.<n>While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output.<n>We propose a novel approach that models well-aligned distribution to 2D observations.
Score: 29.608043710963162
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We tackle the problem of Human Mesh Recovery (HMR) from a single RGB image, formulating it as an image-conditioned human pose and shape generation. While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output. Probabilistic methods attempt to address this by generating multiple plausible outputs to model the ambiguity. However, these methods often exhibit a trade-off between accuracy and sample diversity, and their single predictions are not competitive with state-of-the-art deterministic models. To overcome these limitations, we propose a novel approach that models well-aligned distribution to 2D observations. In particular, we introduce $SO(3)$ diffusion model, which generates the distribution of pose parameters represented as 3D rotations unconditional and conditional to image observations via conditioning dropout. Our model learns the hierarchical structure of human body joints using the transformer. Instead of using transformer as a denoising model, the time-independent transformer extracts latent vectors for the joints and a small MLP-based denoising model learns the per-joint distribution conditioned on the latent vector. We experimentally demonstrate and analyze that our model predicts accurate pose probability distribution effectively.

Related papers

VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery [75.62565146049015]
We introduce a dual-memory augmented HMR critique agent with self-reflection to produce context-aware quality scores for predicted meshes.<n>These scores distill fine-grained cues about 3D human motion structure, physical feasibility, and alignment with the input image.<n>We propose a group preference alignment framework for finetuning diffusion-based HMR models.
arXiv Detail & Related papers (2026-02-22T13:19:06Z)
Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton [27.708016152889787]
Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses. Most of the hypotheses generated deviate substantially from the true pose. Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction. We propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion.
arXiv Detail & Related papers (2024-01-10T04:07:50Z)
ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [71.2556016049579]
ManiPose is a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting.<n>By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses.<n>We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency.
arXiv Detail & Related papers (2023-12-11T13:50:10Z)
Generative Approach for Probabilistic Human Mesh Recovery using Diffusion Models [33.2565018922113]
This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image. We propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)"
arXiv Detail & Related papers (2023-08-05T22:23:04Z)
Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision [76.32860119056964]
We propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. We demonstrate the effectiveness of our method on three challenging computer vision tasks.
arXiv Detail & Related papers (2023-06-20T17:53:00Z)
HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation [27.14060158187953]
Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image. We show that these approaches exhibit a trade-off between three key properties. Our method, HuManiFlow, predicts simultaneously accurate, consistent and diverse distributions.
arXiv Detail & Related papers (2023-05-11T16:49:19Z)
DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z)
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses [74.48263583706712]
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth. We propose a weakly supervised deep generative network to address the inverse problem.
arXiv Detail & Related papers (2020-08-13T09:26:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.