Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views
- URL: http://arxiv.org/abs/2309.11962v1
- Date: Thu, 21 Sep 2023 10:34:35 GMT
- Title: Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views
- Authors: Taeho Kang, Kyungjin Lee, Jinrui Zhang, Youngki Lee
- Abstract summary: Ego3DPose is a highly accurate binocular egocentric 3D pose reconstruction system.
We propose a two-path network architecture with a path that estimates pose per limb independently with its binocular heatmaps.
We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs.
- Score: 9.476008200056082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Ego3DPose, a highly accurate binocular egocentric 3D pose
reconstruction system. The binocular egocentric setup offers practicality and
usefulness in various applications, however, it remains largely under-explored.
It has been suffering from low pose estimation accuracy due to viewing
distortion, severe self-occlusion, and limited field-of-view of the joints in
egocentric 2D images. Here, we notice that two important 3D cues, stereo
correspondences, and perspective, contained in the egocentric binocular input
are neglected. Current methods heavily rely on 2D image features, implicitly
learning 3D information, which introduces biases towards commonly observed
motions and leads to low overall accuracy. We observe that they not only fail
in challenging occlusion cases but also in estimating visible joint positions.
To address these challenges, we propose two novel approaches. First, we design
a two-path network architecture with a path that estimates pose per limb
independently with its binocular heatmaps. Without full-body information
provided, it alleviates bias toward trained full-body distribution. Second, we
leverage the egocentric view of body limbs, which exhibits strong perspective
variance (e.g., a significantly large-size hand when it is close to the
camera). We propose a new perspective-aware representation using trigonometry,
enabling the network to estimate the 3D orientation of limbs. Finally, we
develop an end-to-end pose reconstruction network that synergizes both
techniques. Our comprehensive evaluations demonstrate that Ego3DPose
outperforms state-of-the-art models by a pose estimation error (i.e., MPJPE)
reduction of 23.1% in the UnrealEgo dataset. Our qualitative results highlight
the superiority of our approach across a range of scenarios and challenges.
Related papers
- 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling [19.747618899243555]
We set our sights on a short-baseline binocular setting that offers both portability and a geometric measurement property.
As the binocular baseline shortens, two serious challenges emerge: first, the robustness of 3D reconstruction against 2D errors deteriorates.
We propose the Stereo Co-Keypoints Estimation module to improve the view consistency of 2D keypoints and enhance the 3D robustness.
arXiv Detail & Related papers (2023-11-24T01:15:57Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision [72.36132924512299]
We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-01-20T00:45:13Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.