VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery
- URL: http://arxiv.org/abs/2602.19180v1
- Date: Sun, 22 Feb 2026 13:19:06 GMT
- Title: VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery
- Authors: Wenhao Shen, Hao Wang, Wanqi Yin, Fayao Liu, Xulei Yang, Chao Liang, Zhongang Cai, Guosheng Lin,
- Abstract summary: We introduce a dual-memory augmented HMR critique agent with self-reflection to produce context-aware quality scores for predicted meshes.<n>These scores distill fine-grained cues about 3D human motion structure, physical feasibility, and alignment with the input image.<n>We propose a group preference alignment framework for finetuning diffusion-based HMR models.
- Score: 75.62565146049015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human mesh recovery (HMR) from a single RGB image is inherently ambiguous, as multiple 3D poses can correspond to the same 2D observation. Recent diffusion-based methods tackle this by generating various hypotheses, but often sacrifice accuracy. They yield predictions that are either physically implausible or drift from the input image, especially under occlusion or in cluttered, in-the-wild scenes. To address this, we introduce a dual-memory augmented HMR critique agent with self-reflection to produce context-aware quality scores for predicted meshes. These scores distill fine-grained cues about 3D human motion structure, physical feasibility, and alignment with the input image. We use these scores to build a group-wise HMR preference dataset. Leveraging this dataset, we propose a group preference alignment framework for finetuning diffusion-based HMR models. This process injects the rich preference signals into the model, guiding it to generate more physically plausible and image-consistent human meshes. Extensive experiments demonstrate that our method achieves superior performance compared to state-of-the-art approaches.
Related papers
- Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference [69.34278282513593]
Preference Score Distillation (PSD) is an optimization-based framework for human-aligned text-to-3D synthesis without 3D training data.<n>Our key insight stems from the incompatibility of pixel-level gradients.<n>We introduce an adaptive strategy to co-optimize preference scores and negative text embeddings.
arXiv Detail & Related papers (2026-03-02T08:23:36Z) - LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion [29.608043710963162]
We tackle the problem of Human Mesh Recovery from a single RGB image.<n>While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output.<n>We propose a novel approach that models well-aligned distribution to 2D observations.
arXiv Detail & Related papers (2025-09-30T03:50:56Z) - ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization [51.904899019761594]
We propose ADHMR, a framework that Aligns a Diffusion-based HMR model in a preference optimization manner.<n>First, we train a human mesh prediction assessment model, HMR-Scorer, capable of evaluating predictions even for in-the-wild images without 3D annotations.<n>We then use HMR-Scorer to create a preference dataset, where each input image has a pair of winner and loser mesh predictions.
arXiv Detail & Related papers (2025-05-15T13:04:51Z) - Personalized 3D Human Pose and Shape Refinement [19.082329060985455]
regression-based methods have dominated the field of 3D human pose and shape estimation.
We propose to construct dense correspondences between initial human model estimates and the corresponding images.
We show that our approach not only consistently leads to better image-model alignment, but also to improved 3D accuracy.
arXiv Detail & Related papers (2024-03-18T10:13:53Z) - Score-Guided Diffusion for 3D Human Recovery [10.562998991986102]
We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction.
ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model.
We evaluate our approach on three settings/applications: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences.
arXiv Detail & Related papers (2024-03-14T17:56:14Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Appearance Consensus Driven Self-Supervised Human Mesh Recovery [67.20942777949793]
We present a self-supervised human mesh recovery framework to infer human pose and shape from monocular images.
We achieve state-of-the-art results on the standard model-based 3D pose estimation benchmarks.
The resulting colored mesh prediction opens up the usage of our framework for a variety of appearance-related tasks beyond the pose and shape estimation.
arXiv Detail & Related papers (2020-08-04T05:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.