Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose
Estimation
- URL: http://arxiv.org/abs/2012.09398v1
- Date: Thu, 17 Dec 2020 05:32:44 GMT
- Title: Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose
Estimation
- Authors: Chenxin Xu, Siheng Chen, Maosen Li, Ya Zhang
- Abstract summary: We propose a novel method based on teacher-student learning framework for 3D human pose estimation.
Our method reduces the 3D joint prediction error by 11.4% compared to state-of-the-art unsupervised methods.
- Score: 28.83582658618296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel method based on teacher-student learning framework for 3D
human pose estimation without any 3D annotation or side information. To solve
this unsupervised-learning problem, the teacher network adopts
pose-dictionary-based modeling for regularization to estimate a physically
plausible 3D pose. To handle the decomposition ambiguity in the teacher
network, we propose a cycle-consistent architecture promoting a 3D
rotation-invariant property to train the teacher network. To further improve
the estimation accuracy, the student network adopts a novel graph convolution
network for flexibility to directly estimate the 3D coordinates. Another
cycle-consistent architecture promoting 3D rotation-equivariant property is
adopted to exploit geometry consistency, together with knowledge distillation
from the teacher network to improve the pose estimation performance. We conduct
extensive experiments on Human3.6M and MPI-INF-3DHP. Our method reduces the 3D
joint prediction error by 11.4% compared to state-of-the-art unsupervised
methods and also outperforms many weakly-supervised methods that use side
information on Human3.6M. Code will be available at
https://github.com/sjtuxcx/ITES.
Related papers
- PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - 3D Human Pose Regression using Graph Convolutional Network [68.8204255655161]
We propose a graph convolutional network named PoseGraphNet for 3D human pose regression from 2D poses.
Our model's performance is close to the state-of-the-art, but with much fewer parameters.
arXiv Detail & Related papers (2021-05-21T14:41:31Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Self-supervision on Unlabelled OR Data for Multi-person 2D/3D Human Pose
Estimation [2.8802646903517957]
2D/3D human pose estimation is needed to develop novel intelligent tools for the operating room.
We propose to use knowledge distillation in a teacher/student framework to harness the knowledge present in a large-scale non-annotated dataset.
The easily deployable network trained using this effective self-supervision strategy performs on par with the teacher network on emphMVOR+, an extension of the public MVOR dataset.
arXiv Detail & Related papers (2020-07-16T14:28:22Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z) - PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge
Distillation [6.023152721616894]
PoseNet3D takes 2D joints as input and outputs 3D skeletons and SMPL body model parameters.
We first train a teacher network that outputs 3D skeletons, using only 2D poses for training. The teacher network distills its knowledge to a student network that predicts 3D pose in SMPL representation.
Results on Human3.6M dataset for 3D human pose estimation demonstrate that our approach reduces the 3D joint prediction error by 18% compared to previous unsupervised methods.
arXiv Detail & Related papers (2020-03-07T00:10:59Z) - Lightweight 3D Human Pose Estimation Network Training Using
Teacher-Student Learning [15.321557614896268]
MoVNect is a lightweight deep neural network to capture 3D human pose using a single RGB camera.
We apply the teacher-student learning method based knowledge distillation to 3D human pose estimation.
We implement a 3D avatar application running on mobile in real-time to demonstrate that our network achieves both high accuracy and fast inference time.
arXiv Detail & Related papers (2020-01-15T01:31:01Z) - Chained Representation Cycling: Learning to Estimate 3D Human Pose and
Shape by Cycling Between Representations [73.11883464562895]
We propose a new architecture that facilitates unsupervised, or lightly supervised, learning.
We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images.
While we present results for modeling humans, our formulation is general and can be applied to other vision problems.
arXiv Detail & Related papers (2020-01-06T14:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.