Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision
- URL: http://arxiv.org/abs/2201.07929v1
- Date: Thu, 20 Jan 2022 00:45:13 GMT
- Title: Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision
- Authors: Jian Wang and Lingjie Liu and Weipeng Xu and Kripasindhu Sarkar and
Diogo Luvizon and Christian Theobalt
- Abstract summary: We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
- Score: 72.36132924512299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Egocentric 3D human pose estimation with a single fisheye camera has drawn a
significant amount of attention recently. However, existing methods struggle
with pose estimation from in-the-wild images, because they can only be trained
on synthetic data due to the unavailability of large-scale in-the-wild
egocentric datasets. Furthermore, these methods easily fail when the body parts
are occluded by or interacting with the surrounding scene. To address the
shortage of in-the-wild data, we collect a large-scale in-the-wild egocentric
dataset called Egocentric Poses in the Wild (EgoPW). This dataset is captured
by a head-mounted fisheye camera and an auxiliary external camera, which
provides an additional observation of the human body from a third-person
perspective during training. We present a new egocentric pose estimation
method, which can be trained on the new dataset with weak external supervision.
Specifically, we first generate pseudo labels for the EgoPW dataset with a
spatio-temporal optimization method by incorporating the external-view
supervision. The pseudo labels are then used to train an egocentric pose
estimation network. To facilitate the network training, we propose a novel
learning strategy to supervise the egocentric features with the high-quality
features extracted by a pretrained external-view pose estimation model. The
experiments show that our method predicts accurate 3D poses from a single
in-the-wild egocentric image and outperforms the state-of-the-art methods both
quantitatively and qualitatively.
Related papers
- Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views [9.476008200056082]
Ego3DPose is a highly accurate binocular egocentric 3D pose reconstruction system.
We propose a two-path network architecture with a path that estimates pose per limb independently with its binocular heatmaps.
We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs.
arXiv Detail & Related papers (2023-09-21T10:34:35Z) - Understanding Pose and Appearance Disentanglement in 3D Human Pose
Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one.
We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments.
We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z) - EgoHumans: An Egocentric 3D Multi-Human Benchmark [37.375846688453514]
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild.
We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing tennis, fencing, volleyball, etc.
arXiv Detail & Related papers (2023-05-25T21:37:36Z) - Scene-aware Egocentric 3D Human Pose Estimation [72.57527706631964]
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality.
Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene.
We propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints.
arXiv Detail & Related papers (2022-12-20T21:35:39Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Enhancing Egocentric 3D Pose Estimation with Third Person Views [37.9683439632693]
We propose a novel approach to enhance the 3D body pose estimation of a person computed from videos captured from a single wearable camera.
We introduce First2Third-Pose, a new paired synchronized dataset of nearly 2,000 videos depicting human activities captured from both first- and third-view perspectives.
Experimental results demonstrate that the joint multi-view embedded space learned with our dataset is useful to extract discriminatory features from arbitrary single-view egocentric videos.
arXiv Detail & Related papers (2022-01-06T11:42:01Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.