PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision
- URL: http://arxiv.org/abs/2203.15625v1
- Date: Tue, 29 Mar 2022 14:45:53 GMT
- Title: PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision
- Authors: Kehong Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, Jing Huang,
Michael Bi Mi, Jiashi Feng, Xinchao Wang
- Abstract summary: Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
- Score: 102.48681650013698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing self-supervised 3D human pose estimation schemes have largely relied
on weak supervisions like consistency loss to guide the learning, which,
inevitably, leads to inferior results in real-world scenarios with unseen
poses. In this paper, we propose a novel self-supervised approach that allows
us to explicitly generate 2D-3D pose pairs for augmenting supervision, through
a self-enhancing dual-loop learning framework. This is made possible via
introducing a reinforcement-learning-based imitator, which is learned jointly
with a pose estimator alongside a pose hallucinator; the three components form
two loops during the training process, complementing and strengthening one
another. Specifically, the pose estimator transforms an input 2D pose sequence
to a low-fidelity 3D output, which is then enhanced by the imitator that
enforces physical constraints. The refined 3D poses are subsequently fed to the
hallucinator for producing even more diverse data, which are, in turn,
strengthened by the imitator and further utilized to train the pose estimator.
Such a co-evolution scheme, in practice, enables training a pose estimator on
self-generated motion data without relying on any given 3D data. Extensive
experiments across various benchmarks demonstrate that our approach yields
encouraging results significantly outperforming the state of the art and, in
some cases, even on par with results of fully-supervised methods. Notably, it
achieves 89.1% 3D PCK on MPI-INF-3DHP under self-supervised cross-dataset
evaluation setup, improving upon the previous best self-supervised methods by
8.6%. Code can be found at: https://github.com/Garfield-kh/PoseTriplet
Related papers
- CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by
Leveraging In-the-wild 2D Annotations [25.05308239278207]
We present CameraPose, a weakly-supervised framework for 3D human pose estimation from a single image.
By adding a camera parameter branch, any in-the-wild 2D annotations can be fed into our pipeline to boost the training diversity.
We also introduce a refinement network module with confidence-guided loss to further improve the quality of noisy 2D keypoints extracted by 2D pose estimators.
arXiv Detail & Related papers (2023-01-08T05:07:41Z) - Optimising 2D Pose Representation: Improve Accuracy, Stability and
Generalisability Within Unsupervised 2D-3D Human Pose Estimation [7.294965109944706]
We show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
Our results show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network.
arXiv Detail & Related papers (2022-09-01T17:32:52Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.