PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge
Distillation
- URL: http://arxiv.org/abs/2003.03473v2
- Date: Thu, 12 Nov 2020 05:07:51 GMT
- Title: PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge
Distillation
- Authors: Shashank Tripathi, Siddhant Ranade, Ambrish Tyagi, Amit Agrawal
- Abstract summary: PoseNet3D takes 2D joints as input and outputs 3D skeletons and SMPL body model parameters.
We first train a teacher network that outputs 3D skeletons, using only 2D poses for training. The teacher network distills its knowledge to a student network that predicts 3D pose in SMPL representation.
Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach reduces the 3D joint prediction error by 18% compared to previous unsupervised methods.
- Score: 6.023152721616894
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recovering 3D human pose from 2D joints is a highly unconstrained problem. We
propose a novel neural network framework, PoseNet3D, that takes 2D joints as
input and outputs 3D skeletons and SMPL body model parameters. By casting our
learning approach in a student-teacher framework, we avoid using any 3D data
such as paired/unpaired 3D data, motion capture sequences, depth images or
multi-view images during training. We first train a teacher network that
outputs 3D skeletons, using only 2D poses for training. The teacher network
distills its knowledge to a student network that predicts 3D pose in SMPL
representation. Finally, both the teacher and the student networks are jointly
fine-tuned in an end-to-end manner using temporal, self-consistency and
adversarial losses, improving the accuracy of each individual network. Results
on Human3.6M dataset for 3D human pose estimation demonstrate that our approach
reduces the 3D joint prediction error by 18% compared to previous unsupervised
methods. Qualitative results on in-the-wild datasets show that the recovered 3D
poses and meshes are natural, realistic, and flow smoothly over consecutive
frames.
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - Learning to Estimate 3D Human Pose from Point Cloud [13.27496851711973]
We propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures.
Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods.
arXiv Detail & Related papers (2022-12-25T14:22:01Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - 3D Human Pose Regression using Graph Convolutional Network [68.8204255655161]
We propose a graph convolutional network named PoseGraphNet for 3D human pose regression from 2D poses.
Our model's performance is close to the state-of-the-art, but with much fewer parameters.
arXiv Detail & Related papers (2021-05-21T14:41:31Z) - Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose
Estimation [28.83582658618296]
We propose a novel method based on teacher-student learning framework for 3D human pose estimation.
Our method reduces the 3D joint prediction error by 11.4% compared to state-of-the-art unsupervised methods.
arXiv Detail & Related papers (2020-12-17T05:32:44Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z) - Chained Representation Cycling: Learning to Estimate 3D Human Pose and
Shape by Cycling Between Representations [73.11883464562895]
We propose a new architecture that facilitates unsupervised, or lightly supervised, learning.
We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images.
While we present results for modeling humans, our formulation is general and can be applied to other vision problems.
arXiv Detail & Related papers (2020-01-06T14:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.