Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation
- URL: http://arxiv.org/abs/2006.14107v1
- Date: Wed, 24 Jun 2020 23:56:33 GMT
- Title: Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation
- Authors: Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, R.
Venkatesh Babu, Anirban Chakraborty
- Abstract summary: Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
- Score: 58.72192168935338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimation of 3D human pose from monocular image has gained considerable
attention, as a key step to several human-centric applications. However,
generalizability of human pose estimation models developed using supervision on
large-scale in-studio datasets remains questionable, as these models often
perform unsatisfactorily on unseen in-the-wild environments. Though
weakly-supervised models have been proposed to address this shortcoming,
performance of such models relies on availability of paired supervision on some
related tasks, such as 2D pose or multi-view image pairs. In contrast, we
propose a novel kinematic-structure-preserved unsupervised 3D pose estimation
framework, which is not restrained by any paired or unpaired weak supervisions.
Our pose estimation framework relies on a minimal set of prior knowledge that
defines the underlying kinematic 3D structure, such as skeletal joint
connectivity information with bone-length ratios in a fixed canonical scale.
The proposed model employs three consecutive differentiable transformations
named as forward-kinematics, camera-projection and spatial-map transformation.
This design not only acts as a suitable bottleneck stimulating effective pose
disentanglement but also yields interpretable latent pose representations
avoiding training of an explicit latent embedding to pose mapper. Furthermore,
devoid of unstable adversarial setup, we re-utilize the decoder to formalize an
energy-based loss, which enables us to learn from in-the-wild videos, beyond
laboratory settings. Comprehensive experiments demonstrate our state-of-the-art
unsupervised and weakly-supervised pose estimation performance on both
Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments
further establish our superior generalization ability.
Related papers
- Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery [70.66865453410958]
Articulation-centric 2D/3D pose supervision forms the core training objective in most existing 3D human pose estimation techniques.
We propose a novel framework that relies only on silhouette supervision to adapt a source-trained model-based regressor.
We develop a series of convolution-friendly spatial transformations in order to disentangle a topological-skeleton representation from the raw silhouette.
arXiv Detail & Related papers (2022-04-04T06:58:15Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation [1.1501261942096426]
We introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation.
Our model is able to capture the long-range dependencies between body joints.
Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-11-01T13:48:55Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.