Related papers: MEGA: Masked Generative Autoencoder for Human Mesh Recovery

MEGA: Masked Generative Autoencoder for Human Mesh Recovery

URL: http://arxiv.org/abs/2405.18839v2
Date: Fri, 31 May 2024 14:03:07 GMT
Title: MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Authors: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer,
Abstract summary: Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem. Most HMR methods overlook this ambiguity and make a single prediction without accounting for the associated uncertainty. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh sequences.
Score: 33.26995842920877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as similar 2D projections can correspond to multiple 3D interpretations. Nevertheless, most HMR methods overlook this ambiguity and make a single prediction without accounting for the associated uncertainty. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. MEGA enables us to propose multiple outputs and to evaluate the uncertainty of the predictions. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.

Related papers

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking [17.371579113481644]
Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence.<n>We propose the Partial masking scheme (Prime), which augments MDM by allowing tokens to take intermediate states between the masked and unmasked states.<n>Our method demonstrates superior performance across a diverse set of generative modeling tasks.
arXiv Detail & Related papers (2025-05-24T04:16:40Z)
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization [51.904899019761594]
We propose ADHMR, a framework that Aligns a Diffusion-based HMR model in a preference optimization manner.<n>First, we train a human mesh prediction assessment model, HMR-Scorer, capable of evaluating predictions even for in-the-wild images without 3D annotations.<n>We then use HMR-Scorer to create a preference dataset, where each input image has a pair of winner and loser mesh predictions.
arXiv Detail & Related papers (2025-05-15T13:04:51Z)
SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM [8.525516300734024]
We introduce SeqSAM, a sequential, RNN-inspired approach to generating multiple masks. We show notable improvements in quality of each mask produced across two publicly available datasets.
arXiv Detail & Related papers (2025-03-12T20:01:52Z)
GenHMR: Generative Human Mesh Recovery [14.708444067294325]
GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task. Experiments on benchmark datasets demonstrate that GenHMR significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-12-19T01:45:58Z)
CondiMen: Conditional Multi-Person Mesh Recovery [0.0]
We propose CondiMen, a method that outputs a joint parametric distribution over likely poses, body shapes, intrinsics and distances to the camera. We find that our model achieves performance on par with or better than the state-of-the-art.
arXiv Detail & Related papers (2024-12-17T16:22:56Z)
OFER: Occluded Face Expression Reconstruction [16.06622406877353]
We introduce OFER, a novel approach for single image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces. We propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on the predicted shape accuracy scores to select the best match.
arXiv Detail & Related papers (2024-10-29T00:21:26Z)
Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM) Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z)
Score-Guided Diffusion for 3D Human Recovery [10.562998991986102]
We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. We evaluate our approach on three settings/applications: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences.
arXiv Detail & Related papers (2024-03-14T17:56:14Z)
Generative Approach for Probabilistic Human Mesh Recovery using Diffusion Models [33.2565018922113]
This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image. We propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)"
arXiv Detail & Related papers (2023-08-05T22:23:04Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose [20.099670445427964]
Reconstructing multi-human body mesh from a single monocular image is an important but challenging computer vision problem. In this work, through a single graph neural network, we construct coherent multi-human meshes using only multi-human 2D pose as input.
arXiv Detail & Related papers (2022-05-25T08:54:52Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z)
Monocular, One-stage, Regression of Multiple 3D People [105.3143785498094]
We propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP) Our method simultaneously predicts a Body Center heatmap and a Mesh map, which can jointly describe the 3D body mesh on the pixel level. Compared with state-of-the-art methods, ROMP superior performance on the challenging multi-person benchmarks.
arXiv Detail & Related papers (2020-08-27T17:21:47Z)
Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.