Weakly Supervised 3D Human Pose and Shape Reconstruction with
Normalizing Flows
- URL: http://arxiv.org/abs/2003.10350v2
- Date: Sat, 22 Aug 2020 14:46:19 GMT
- Title: Weakly Supervised 3D Human Pose and Shape Reconstruction with
Normalizing Flows
- Authors: Andrei Zanfir, Eduard Gabriel Bazavan, Hongyi Xu, Bill Freeman, Rahul
Sukthankar and Cristian Sminchisescu
- Abstract summary: We present semi-supervised and self-supervised models that support training and good generalization in real-world images and video.
Our formulation is based on kinematic latent normalizing flow representations and dynamics, as well as differentiable, semantic body part alignment loss functions.
In extensive experiments using 3D motion capture datasets like CMU, Human3.6M, 3DPW, or AMASS, we show that the proposed methods outperform the state of the art.
- Score: 43.89097619822221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D human pose and shape estimation is challenging due to the many
degrees of freedom of the human body and thedifficulty to acquire training data
for large-scale supervised learning in complex visual scenes. In this paper we
present practical semi-supervised and self-supervised models that support
training and good generalization in real-world images and video. Our
formulation is based on kinematic latent normalizing flow representations and
dynamics, as well as differentiable, semantic body part alignment loss
functions that support self-supervised learning. In extensive experiments using
3D motion capture datasets like CMU, Human3.6M, 3DPW, or AMASS, as well as
image repositories like COCO, we show that the proposed methods outperform the
state of the art, supporting the practical construction of an accurate family
of models based on large-scale training with diverse and incompletely labeled
image and video data.
Related papers
- Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance [25.346255905155424]
We introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework.
By representing the 3D human parametric model as the motion guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion.
Our approach also exhibits superior generalization capabilities on the proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-03-21T18:52:58Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - Self-Supervised Human Depth Estimation from Monocular Videos [99.39414134919117]
Previous methods on estimating detailed human depth often require supervised training with ground truth' depth data.
This paper presents a self-supervised method that can be trained on YouTube videos without known depth.
Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
arXiv Detail & Related papers (2020-05-07T09:45:11Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.