Lightweight 3D Human Pose Estimation Network Training Using
Teacher-Student Learning
- URL: http://arxiv.org/abs/2001.05097v1
- Date: Wed, 15 Jan 2020 01:31:01 GMT
- Title: Lightweight 3D Human Pose Estimation Network Training Using
Teacher-Student Learning
- Authors: Dong-Hyun Hwang, Suntae Kim, Nicolas Monet, Hideki Koike, Soonmin Bae
- Abstract summary: MoVNect is a lightweight deep neural network to capture 3D human pose using a single RGB camera.
We apply the teacher-student learning method based knowledge distillation to 3D human pose estimation.
We implement a 3D avatar application running on mobile in real-time to demonstrate that our network achieves both high accuracy and fast inference time.
- Score: 15.321557614896268
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present MoVNect, a lightweight deep neural network to capture 3D human
pose using a single RGB camera. To improve the overall performance of the
model, we apply the teacher-student learning method based knowledge
distillation to 3D human pose estimation. Real-time post-processing makes the
CNN output yield temporally stable 3D skeletal information, which can be used
in applications directly. We implement a 3D avatar application running on
mobile in real-time to demonstrate that our network achieves both high accuracy
and fast inference time. Extensive evaluations show the advantages of our
lightweight model with the proposed training method over previous 3D pose
estimation methods on the Human3.6M dataset and mobile devices.
Related papers
- Self-supervised 3D Human Pose Estimation from a Single Image [1.0878040851638]
We propose a new self-supervised method for predicting 3D human body pose from a single image.
The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses.
arXiv Detail & Related papers (2023-04-05T10:26:21Z) - Learning to Estimate 3D Human Pose from Point Cloud [13.27496851711973]
We propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures.
Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods.
arXiv Detail & Related papers (2022-12-25T14:22:01Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry [2.7541825072548805]
We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system.
We propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth.
arXiv Detail & Related papers (2021-08-17T17:31:24Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge
Distillation [6.023152721616894]
PoseNet3D takes 2D joints as input and outputs 3D skeletons and SMPL body model parameters.
We first train a teacher network that outputs 3D skeletons, using only 2D poses for training. The teacher network distills its knowledge to a student network that predicts 3D pose in SMPL representation.
Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach reduces the 3D joint prediction error by 18% compared to previous unsupervised methods.
arXiv Detail & Related papers (2020-03-07T00:10:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.