MEGA: Masked Generative Autoencoder for Human Mesh Recovery
- URL: http://arxiv.org/abs/2405.18839v3
- Date: Thu, 14 Nov 2024 10:27:51 GMT
- Title: MEGA: Masked Generative Autoencoder for Human Mesh Recovery
- Authors: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer,
- Abstract summary: Human Mesh Recovery from a single RGB image is a highly ambiguous problem.
Most HMR methods overlook this issue and make a single prediction without accounting for this ambiguity.
This work proposes a new approach based on masked generative modeling.
- Score: 33.26995842920877
- License:
- Abstract: Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as an infinite set of 3D interpretations can explain the 2D observation equally well. Nevertheless, most HMR methods overlook this issue and make a single prediction without accounting for this ambiguity. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.
Related papers
- GenHMR: Generative Human Mesh Recovery [14.708444067294325]
GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task.
Experiments on benchmark datasets demonstrate that GenHMR significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-12-19T01:45:58Z) - CondiMen: Conditional Multi-Person Mesh Recovery [0.0]
We propose CondiMen, a method that outputs a joint parametric distribution over likely poses, body shapes, intrinsics and distances to the camera.
We find that our model achieves performance on par with or better than the state-of-the-art.
arXiv Detail & Related papers (2024-12-17T16:22:56Z) - OFER: Occluded Face Expression Reconstruction [16.06622406877353]
We introduce OFER, a novel approach for single image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces.
We propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on the predicted shape accuracy scores to select the best match.
arXiv Detail & Related papers (2024-10-29T00:21:26Z) - Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM)
Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians.
To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z) - Score-Guided Diffusion for 3D Human Recovery [10.562998991986102]
We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction.
ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model.
We evaluate our approach on three settings/applications: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences.
arXiv Detail & Related papers (2024-03-14T17:56:14Z) - Generative Approach for Probabilistic Human Mesh Recovery using
Diffusion Models [33.2565018922113]
This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image.
We propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)"
arXiv Detail & Related papers (2023-08-05T22:23:04Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z) - Monocular, One-stage, Regression of Multiple 3D People [105.3143785498094]
We propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP)
Our method simultaneously predicts a Body Center heatmap and a Mesh map, which can jointly describe the 3D body mesh on the pixel level.
Compared with state-of-the-art methods, ROMP superior performance on the challenging multi-person benchmarks.
arXiv Detail & Related papers (2020-08-27T17:21:47Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.