LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic
Occlusion-Aware Data and Neural Mesh Rendering
- URL: http://arxiv.org/abs/2108.00351v1
- Date: Sun, 1 Aug 2021 02:09:16 GMT
- Title: LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic
Occlusion-Aware Data and Neural Mesh Rendering
- Authors: Kaibing Yang, Renshu Gu, Masahiro Toyoura and Gang Xu
- Abstract summary: We propose a framework that synthesizes silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters.
A neural 3D mesh is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation.
We are among state-of-the-art on the 3DPW dataset in terms of pose accuracy and evidently outperform the rank-1 method in terms of shape accuracy.
- Score: 3.007707487678111
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key challenge in the task of human pose and shape estimation is occlusion,
including self-occlusions, object-human occlusions, and inter-person
occlusions. The lack of diverse and accurate pose and shape training data
becomes a major bottleneck, especially for scenes with occlusions in the wild.
In this paper, we focus on the estimation of human pose and shape in the case
of inter-person occlusions, while also handling object-human occlusions and
self-occlusion. We propose a framework that synthesizes occlusion-aware
silhouette and 2D keypoints data and directly regress to the SMPL pose and
shape parameters. A neural 3D mesh renderer is exploited to enable silhouette
supervision on the fly, which contributes to great improvements in shape
estimation. In addition, keypoints-and-silhouette-driven training data in
panoramic viewpoints are synthesized to compensate for the lack of viewpoint
diversity in any existing dataset. Experimental results show that we are among
state-of-the-art on the 3DPW dataset in terms of pose accuracy and evidently
outperform the rank-1 method in terms of shape accuracy. Top performance is
also achieved on SSP-3D in terms of shape prediction accuracy.
Related papers
- Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen
Objects [32.32128461720876]
Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data.
We systematically evaluate single-view 3D shape retrieval along three different axes: the presence of object occlusions and truncations, generalization to unseen 3D shape data, and generalization to unseen objects in the input images.
arXiv Detail & Related papers (2023-12-31T05:39:38Z) - Learning Visibility for Robust Dense Human Body Estimation [78.37389398573882]
Estimating 3D human pose and shape from 2D images is a crucial yet challenging task.
We learn dense human body estimation that is robust to partial observations.
We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates.
arXiv Detail & Related papers (2022-08-23T00:01:05Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - Synthetic Training for Accurate 3D Human Pose and Shape Estimation in
the Wild [27.14060158187953]
This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image.
We propose STRAPS, a system that uses proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network.
We show that STRAPS outperforms other state-of-the-art methods on SSP-3D in terms of shape prediction accuracy.
arXiv Detail & Related papers (2020-09-21T16:39:04Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z) - 3D Human Pose Estimation using Spatio-Temporal Networks with Explicit
Occlusion Training [40.933783830017035]
Estimating 3D poses from a monocular task is still a challenging task, despite the significant progress that has been made in recent years.
We introduce a-temporal video network for robust 3D human pose estimation.
We apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multistride temporal convolutional net-works (TCNs) to estimate 3D joints or keypoints.
arXiv Detail & Related papers (2020-04-07T09:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.