CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild
- URL: http://arxiv.org/abs/2011.14679v1
- Date: Mon, 30 Nov 2020 10:42:27 GMT
- Title: CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild
- Authors: Bastian Wandt, Marco Rudolph, Petrissa Zell, Helge Rhodin, Bodo
Rosenhahn
- Abstract summary: We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data.
In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras.
Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
- Score: 31.334715988245748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human pose estimation from single images is a challenging problem in computer
vision that requires large amounts of labeled training data to be solved
accurately. Unfortunately, for many human activities (\eg outdoor sports) such
training data does not exist and is hard or even impossible to acquire with
traditional motion capture systems. We propose a self-supervised approach that
learns a single image 3D pose estimator from unlabeled multi-view data. To this
end, we exploit multi-view consistency constraints to disentangle the observed
2D pose into the underlying 3D pose and camera rotation. In contrast to most
existing methods, we do not require calibrated cameras and can therefore learn
from moving cameras. Nevertheless, in the case of a static camera setup, we
present an optional extension to include constant relative camera rotations
over multiple views into our framework. Key to the success are new, unbiased
reconstruction objectives that mix information across views and training
samples. The proposed approach is evaluated on two benchmark datasets
(Human3.6M and MPII-INF-3DHP) and on the in-the-wild SkiPose dataset.
Related papers
- Self-learning Canonical Space for Multi-view 3D Human Pose Estimation [57.969696744428475]
Multi-view 3D human pose estimation is naturally superior to single view one.
The accurate annotation of these information is hard to obtain.
We propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet)
CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.
arXiv Detail & Related papers (2024-03-19T04:54:59Z) - Multi-View Person Matching and 3D Pose Estimation with Arbitrary
Uncalibrated Camera Networks [36.49915280876899]
Cross-view person matching and 3D human pose estimation in multi-camera networks are difficult when the cameras are extrinsically uncalibrated.
Existing efforts require large amounts of 3D data for training neural networks or known camera poses for geometric constraints to solve the problem.
We present a method, PME, that solves the two tasks without requiring either information.
arXiv Detail & Related papers (2023-12-04T01:28:38Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual
Data [69.64723752430244]
We introduce VirtualPose, a two-stage learning framework to exploit the hidden "free lunch" specific to this task.
The first stage transforms images to abstract geometry representations (AGR), and then the second maps them to 3D poses.
It addresses the generalization issue from two aspects: (1) the first stage can be trained on diverse 2D datasets to reduce the risk of over-fitting to limited appearance; (2) the second stage can be trained on diverse AGR synthesized from a large number of virtual cameras and poses.
arXiv Detail & Related papers (2022-07-20T14:47:28Z) - ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera
Elevation and Learning Normalizing Flows on 2D Poses [23.554957518485324]
We propose an unsupervised approach that learns to predict a 3D human pose from a single image.
We estimate the 3D pose that is most likely over random projections, with the likelihood estimated using normalizing flows on 2D poses.
We outperform the state-of-the-art unsupervised human pose estimation methods on the benchmark datasets Human3.6M and MPI-INF-3DHP in many metrics.
arXiv Detail & Related papers (2021-12-14T01:12:45Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation
from Video [23.00696619207748]
Estimating 3D human poses from video is a challenging problem.
The lack of 3D human pose annotations is a major obstacle for supervised training and for generalization to unseen datasets.
We propose a weakly-supervised training scheme that does not require 3D annotations or calibrated cameras.
arXiv Detail & Related papers (2021-05-14T00:46:48Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.