Learning Temporal 3D Human Pose Estimation with Pseudo-Labels
- URL: http://arxiv.org/abs/2110.07578v1
- Date: Thu, 14 Oct 2021 17:40:45 GMT
- Title: Learning Temporal 3D Human Pose Estimation with Pseudo-Labels
- Authors: Arij Bouazizi and Ulrich Kressel and Vasileios Belagiannis
- Abstract summary: We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
- Score: 3.0954251281114513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple, yet effective, approach for self-supervised 3D human
pose estimation. Unlike the prior work, we explore the temporal information
next to the multi-view self-supervision. During training, we rely on
triangulating 2D body pose estimates of a multiple-view camera system. A
temporal convolutional neural network is trained with the generated 3D
ground-truth and the geometric multi-view consistency loss, imposing
geometrical constraints on the predicted 3D body skeleton. During inference,
our model receives a sequence of 2D body pose estimates from a single-view to
predict the 3D body pose for each of them. An extensive evaluation shows that
our method achieves state-of-the-art performance in the Human3.6M and
MPI-INF-3DHP benchmarks. Our code and models are publicly available at
\url{https://github.com/vru2020/TM_HPE/}.
Related papers
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Unsupervised Multi-Person 3D Human Pose Estimation From 2D Poses Alone [4.648549457266638]
We present one of the first studies investigating the feasibility of unsupervised multi-person 2D-3D pose estimation.
Our method involves independently lifting each subject's 2D pose to 3D, before combining them in a shared 3D coordinate system.
This by itself enables us to retrieve an accurate 3D reconstruction of their poses.
arXiv Detail & Related papers (2023-09-26T11:42:56Z) - Self-supervised 3D Human Pose Estimation from a Single Image [1.0878040851638]
We propose a new self-supervised method for predicting 3D human body pose from a single image.
The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses.
arXiv Detail & Related papers (2023-04-05T10:26:21Z) - Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry [2.7541825072548805]
We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system.
We propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth.
arXiv Detail & Related papers (2021-08-17T17:31:24Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Graph and Temporal Convolutional Networks for 3D Multi-person Pose
Estimation in Monocular Videos [33.974241749058585]
We propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses.
In particular, we introduce a human-joint GCN, which employs the 2D pose estimator's confidence scores to improve the pose estimation results.
The two GCNs work together to estimate the spatial frame-wise 3D poses and can make use of both visible joint and bone information in the target frame to estimate the occluded or missing human-part information.
arXiv Detail & Related papers (2020-12-22T03:01:19Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - From Image Collections to Point Clouds with Self-supervised Shape and
Pose Networks [53.71440550507745]
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision.
We propose a deep learning technique for 3D object reconstruction from a single image.
We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner.
arXiv Detail & Related papers (2020-05-05T04:25:16Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.