VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
- URL: http://arxiv.org/abs/2312.08291v4
- Date: Mon, 15 Jul 2024 12:47:03 GMT
- Title: VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
- Authors: Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer,
- Abstract summary: This work introduces a novel paradigm to address the Human Pose and Shape Estimation problem.
Instead of predicting body model parameters, we focus on predicting the proposed discrete latent representation.
The proposed model, VQ-HPS, predicts the discrete latent representation of the mesh.
- Score: 43.368963897752664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for realistic results, whereas recent non-parametric methods achieve higher precision by directly regressing the 3D coordinates of the human body mesh. This work introduces a novel paradigm to address the HPSE problem, involving a low-dimensional discrete latent representation of the human mesh and framing HPSE as a classification task. Instead of predicting body model parameters or 3D vertex coordinates, we focus on predicting the proposed discrete latent representation, which can be decoded into a registered human mesh. This innovative paradigm offers two key advantages. Firstly, predicting a low-dimensional discrete representation confines our predictions to the space of anthropomorphic poses and shapes even when little training data is available. Secondly, by framing the problem as a classification task, we can harness the discriminative power inherent in neural networks. The proposed model, VQ-HPS, predicts the discrete latent representation of the mesh. The experimental results demonstrate that VQ-HPS outperforms the current state-of-the-art non-parametric approaches while yielding results as realistic as those produced by parametric methods when trained with little data. VQ-HPS also shows promising results when training on large-scale datasets, highlighting the significant potential of the classification approach for HPSE. See the project page at https://g-fiche.github.io/research-pages/vqhps/
Related papers
- Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - An Embeddable Implicit IUVD Representation for Part-based 3D Human Surface Reconstruction [22.349648542401805]
To reconstruct a 3D human surface from a single image, it is crucial to simultaneously consider human pose, shape, and clothing details.
Recent approaches have combined parametric body models (such as SMPL), which capture body pose and shape priors.
We propose a novel IUVD-Feedback representation consisting of an IUVD occupancy function and a feedback query algorithm.
arXiv Detail & Related papers (2024-01-30T08:14:04Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Adversarial Parametric Pose Prior [106.12437086990853]
We learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training.
We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images.
arXiv Detail & Related papers (2021-12-08T10:05:32Z) - HHP-Net: A light Heteroscedastic neural network for Head Pose estimation
with uncertainty [2.064612766965483]
We introduce a novel method to estimate the head pose of people in single images starting from a small set of head keypoints.
Our model is simple to implement and more efficient with respect to the state of the art.
arXiv Detail & Related papers (2021-11-02T08:55:45Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z) - I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human
Pose and Mesh Estimation from a Single RGB Image [79.040930290399]
We propose I2L-MeshNet, an image-to-lixel (line+pixel) prediction network.
The proposed I2L-MeshNet predicts the per-lixel likelihood on 1D heatmaps for each mesh coordinate instead of directly regressing the parameters.
Our lixel-based 1D heatmap preserves the spatial relationship in the input image and models the prediction uncertainty.
arXiv Detail & Related papers (2020-08-09T12:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.