Synthetic Training for Accurate 3D Human Pose and Shape Estimation in
the Wild
- URL: http://arxiv.org/abs/2009.10013v2
- Date: Tue, 22 Sep 2020 10:27:05 GMT
- Title: Synthetic Training for Accurate 3D Human Pose and Shape Estimation in
the Wild
- Authors: Akash Sengupta and Ignas Budvytis and Roberto Cipolla
- Abstract summary: This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image.
We propose STRAPS, a system that uses proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network.
We show that STRAPS outperforms other state-of-the-art methods on SSP-3D in terms of shape prediction accuracy.
- Score: 27.14060158187953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of monocular 3D human shape and pose
estimation from an RGB image. Despite great progress in this field in terms of
pose prediction accuracy, state-of-the-art methods often predict inaccurate
body shapes. We suggest that this is primarily due to the scarcity of
in-the-wild training data with diverse and accurate body shape labels. Thus, we
propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system
that utilises proxy representations, such as silhouettes and 2D joints, as
inputs to a shape and pose regression neural network, which is trained with
synthetic training data (generated on-the-fly during training using the SMPL
statistical body model) to overcome data scarcity. We bridge the gap between
synthetic training inputs and noisy real inputs, which are predicted by
keypoint detection and segmentation CNNs at test-time, by using data
augmentation and corruption during training. In order to evaluate our approach,
we curate and provide a challenging evaluation dataset for monocular human
shape estimation, Sports Shape and Pose 3D (SSP-3D). It consists of RGB images
of tightly-clothed sports-persons with a variety of body shapes and
corresponding pseudo-ground-truth SMPL shape and pose parameters, obtained via
multi-frame optimisation. We show that STRAPS outperforms other
state-of-the-art methods on SSP-3D in terms of shape prediction accuracy, while
remaining competitive with the state-of-the-art on pose-centric datasets and
metrics.
Related papers
- Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Adversarial Parametric Pose Prior [106.12437086990853]
We learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training.
We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images.
arXiv Detail & Related papers (2021-12-08T10:05:32Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic
Occlusion-Aware Data and Neural Mesh Rendering [3.007707487678111]
We propose a framework that synthesizes silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters.
A neural 3D mesh is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation.
We are among state-of-the-art on the 3DPW dataset in terms of pose accuracy and evidently outperform the rank-1 method in terms of shape accuracy.
arXiv Detail & Related papers (2021-08-01T02:09:16Z) - Learning Transferable Kinematic Dictionary for 3D Human Pose and Shape
Reconstruction [15.586347115568973]
We propose a kinematic dictionary, which explicitly regularizes the solution space of relative 3D rotations of human joints.
Our method achieves end-to-end 3D reconstruction without the need of using any shape annotations during the training of neural networks.
The proposed method achieves competitive results on large-scale datasets including Human3.6M, MPI-INF-3DHP, and LSP.
arXiv Detail & Related papers (2021-04-02T09:24:29Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.