Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation
- URL: http://arxiv.org/abs/2204.01971v2
- Date: Wed, 6 Apr 2022 07:29:19 GMT
- Title: Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation
- Authors: Jogendra Nath Kundu, Siddharth Seth, Anirudh Jamkhandi, Pradyumna YM,
Varun Jampani, Anirban Chakraborty, R. Venkatesh Babu
- Abstract summary: 3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
- Score: 63.199549837604444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Available 3D human pose estimation approaches leverage different forms of
strong (2D/3D pose) or weak (multi-view or depth) paired supervision. Barring
synthetic or in-studio domains, acquiring such supervision for each new target
environment is highly inconvenient. To this end, we cast 3D pose learning as a
self-supervised adaptation problem that aims to transfer the task knowledge
from a labeled source domain to a completely unpaired target. We propose to
infer image-to-pose via two explicit mappings viz. image-to-latent and
latent-to-pose where the latter is a pre-learned decoder obtained from a
prior-enforcing generative adversarial auto-encoder. Next, we introduce
relation distillation as a means to align the unpaired cross-modal samples i.e.
the unpaired target videos and unpaired 3D pose sequences. To this end, we
propose a new set of non-local relations in order to characterize long-range
latent pose interactions unlike general contrastive relations where positive
couplings are limited to a local neighborhood structure. Further, we provide an
objective way to quantify non-localness in order to select the most effective
relation set. We evaluate different self-adaptation settings and demonstrate
state-of-the-art 3D human pose estimation performance on standard benchmarks.
Related papers
- Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.