TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation
from Video
- URL: http://arxiv.org/abs/2105.06599v1
- Date: Fri, 14 May 2021 00:46:48 GMT
- Title: TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation
from Video
- Authors: Mohsen Gholami, Ahmad Rezaei, Helge Rhodin, Rabab Ward and Z. Jane
Wang
- Abstract summary: Estimating 3D human poses from video is a challenging problem.
The lack of 3D human pose annotations is a major obstacle for supervised training and for generalization to unseen datasets.
We propose a weakly-supervised training scheme that does not require 3D annotations or calibrated cameras.
- Score: 23.00696619207748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating 3D human poses from video is a challenging problem. The lack of 3D
human pose annotations is a major obstacle for supervised training and for
generalization to unseen datasets. In this work, we address this problem by
proposing a weakly-supervised training scheme that does not require 3D
annotations or calibrated cameras. The proposed method relies on temporal
information and triangulation. Using 2D poses from multiple views as the input,
we first estimate the relative camera orientations and then generate 3D poses
via triangulation. The triangulation is only applied to the views with high 2D
human joint confidence. The generated 3D poses are then used to train a
recurrent lifting network (RLN) that estimates 3D poses from 2D poses. We
further apply a multi-view re-projection loss to the estimated 3D poses and
enforce the 3D poses estimated from multi-views to be consistent. Therefore,
our method relaxes the constraints in practice, only multi-view videos are
required for training, and is thus convenient for in-the-wild settings. At
inference, RLN merely requires single-view videos. The proposed method
outperforms previous works on two challenging datasets, Human3.6M and
MPI-INF-3DHP. Codes and pretrained models will be publicly available.
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera
Elevation and Learning Normalizing Flows on 2D Poses [23.554957518485324]
We propose an unsupervised approach that learns to predict a 3D human pose from a single image.
We estimate the 3D pose that is most likely over random projections, with the likelihood estimated using normalizing flows on 2D poses.
We outperform the state-of-the-art unsupervised human pose estimation methods on the benchmark datasets Human3.6M and MPI-INF-3DHP in many metrics.
arXiv Detail & Related papers (2021-12-14T01:12:45Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh
Estimation in Videos [32.12879364117658]
Estimating 3D hand pose directly from RGB images is challenging but has gained steady progress recently bytraining deep models with annotated 3D poses.
We propose a new framework of training3D pose estimation models from RGB images without usingexplicit 3D annotations.
arXiv Detail & Related papers (2020-12-06T07:54:18Z) - CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data.
In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras.
Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.