Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation
- URL: http://arxiv.org/abs/2008.01388v1
- Date: Tue, 4 Aug 2020 07:54:25 GMT
- Title: Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation
- Authors: Jogendra Nath Kundu, Ambareesh Revanur, Govind Vitthal Waghmare, Rahul
Mysore Venkatesh, R. Venkatesh Babu
- Abstract summary: We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
- Score: 52.94078950641959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a deployment friendly, fast bottom-up framework for multi-person
3D human pose estimation. We adopt a novel neural representation of
multi-person 3D pose which unifies the position of person instances with their
corresponding 3D pose representation. This is realized by learning a generative
pose embedding which not only ensures plausible 3D pose predictions, but also
eliminates the usual keypoint grouping operation as employed in prior bottom-up
approaches. Further, we propose a practical deployment paradigm where paired 2D
or 3D pose annotations are unavailable. In the absence of any paired
supervision, we leverage a frozen network, as a teacher model, which is trained
on an auxiliary task of multi-person 2D pose estimation. We cast the learning
as a cross-modal alignment problem and propose training objectives to realize a
shared latent space between two diverse modalities. We aim to enhance the
model's ability to perform beyond the limiting teacher network by enriching the
latent-to-3D pose mapping using artificially synthesized multi-person 3D scene
samples. Our approach not only generalizes to in-the-wild images, but also
yields a superior trade-off between speed and performance, compared to prior
top-down approaches. Our approach also yields state-of-the-art multi-person 3D
pose estimation performance among the bottom-up approaches under consistent
supervision levels.
Related papers
- Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - 3D Human Pose Estimation Based on 2D-3D Consistency with Synchronized
Adversarial Training [5.306053507202384]
We propose a GAN-based model for 3D human pose estimation, in which a reprojection network is employed to learn the mapping of the distribution from 3D poses to 2D poses.
Inspired by the typical kinematic chain space (KCS) matrix, we introduce a weighted KCS matrix and take it as one of the discriminator's inputs to impose joint angle and bone length constraints.
arXiv Detail & Related papers (2021-06-08T12:11:56Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.