MEGA: Masked Generative Autoencoder for Human Mesh Recovery
- URL: http://arxiv.org/abs/2405.18839v2
- Date: Fri, 31 May 2024 14:03:07 GMT
- Title: MEGA: Masked Generative Autoencoder for Human Mesh Recovery
- Authors: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer,
- Abstract summary: Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem.
Most HMR methods overlook this ambiguity and make a single prediction without accounting for the associated uncertainty.
We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh sequences.
- Score: 33.26995842920877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as similar 2D projections can correspond to multiple 3D interpretations. Nevertheless, most HMR methods overlook this ambiguity and make a single prediction without accounting for the associated uncertainty. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. MEGA enables us to propose multiple outputs and to evaluate the uncertainty of the predictions. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.
Related papers
- Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS)
MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.
arXiv Detail & Related papers (2024-07-11T14:36:53Z) - Learning Gaussian Representation for Eye Fixation Prediction [54.88001757991433]
Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.
We introduce Gaussian Representation for eye fixation modeling.
We design our framework upon some lightweight backbones to achieve real-time fixation prediction.
arXiv Detail & Related papers (2024-03-21T20:28:22Z) - Generative Approach for Probabilistic Human Mesh Recovery using
Diffusion Models [33.2565018922113]
This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image.
We propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)"
arXiv Detail & Related papers (2023-08-05T22:23:04Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision.
We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency.
EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z) - Zero-Episode Few-Shot Contrastive Predictive Coding: Solving
intelligence tests without prior training [0.0]
We argue that finding a predictive latent variable and using it to evaluate the consistency of a future image enables data-efficient predictions.
We show that a one-dimensional Markov Contrastive Predictive Coding model solves sequence completion intelligence tests efficiently.
arXiv Detail & Related papers (2022-05-04T07:46:03Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z) - Weakly Supervised Generative Network for Multiple 3D Human Pose
Hypotheses [74.48263583706712]
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth.
We propose a weakly supervised deep generative network to address the inverse problem.
arXiv Detail & Related papers (2020-08-13T09:26:01Z) - Multitask Non-Autoregressive Model for Human Motion Prediction [33.98939145212708]
Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module.
Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
arXiv Detail & Related papers (2020-07-13T15:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.