Self-supervised 3D Human Pose Estimation from a Single Image
- URL: http://arxiv.org/abs/2304.02349v1
- Date: Wed, 5 Apr 2023 10:26:21 GMT
- Title: Self-supervised 3D Human Pose Estimation from a Single Image
- Authors: Jose Sosa and David Hogg
- Abstract summary: We propose a new self-supervised method for predicting 3D human body pose from a single image.
The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new self-supervised method for predicting 3D human body pose
from a single image. The prediction network is trained from a dataset of
unlabelled images depicting people in typical poses and a set of unpaired 2D
poses. By minimising the need for annotated data, the method has the potential
for rapid application to pose estimation of other articulated structures (e.g.
animals). The self-supervision comes from an earlier idea exploiting
consistency between predicted pose under 3D rotation. Our method is a
substantial advance on state-of-the-art self-supervised methods in training a
mapping directly from images, without limb articulation constraints or any 3D
empirical pose prior. We compare performance with state-of-the-art
self-supervised methods using benchmark datasets that provide images and
ground-truth 3D pose (Human3.6M, MPI-INF-3DHP). Despite the reduced requirement
for annotated data, we show that the method outperforms on Human3.6M and
matches performance on MPI-INF-3DHP. Qualitative results on a dataset of human
hands show the potential for rapidly learning to predict 3D pose for
articulated structures other than the human body.
Related papers
- ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs.
We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input.
Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z) - Understanding Pose and Appearance Disentanglement in 3D Human Pose
Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one.
We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments.
We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z) - Learning to Estimate 3D Human Pose from Point Cloud [13.27496851711973]
We propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures.
Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods.
arXiv Detail & Related papers (2022-12-25T14:22:01Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera
Elevation and Learning Normalizing Flows on 2D Poses [23.554957518485324]
We propose an unsupervised approach that learns to predict a 3D human pose from a single image.
We estimate the 3D pose that is most likely over random projections, with the likelihood estimated using normalizing flows on 2D poses.
We outperform the state-of-the-art unsupervised human pose estimation methods on the benchmark datasets Human3.6M and MPI-INF-3DHP in many metrics.
arXiv Detail & Related papers (2021-12-14T01:12:45Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D
Shape, Pose, and Appearance Consistency [55.94908688207493]
We propose a self-supervised framework named SPICE that closes the image quality gap with supervised methods.
The key insight enabling self-supervision is to exploit 3D information about the human body in several ways.
SPICE achieves state-of-the-art performance on the DeepFashion dataset.
arXiv Detail & Related papers (2021-10-11T17:48:50Z) - Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry [2.7541825072548805]
We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system.
We propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth.
arXiv Detail & Related papers (2021-08-17T17:31:24Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.