EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
- URL: http://arxiv.org/abs/2308.06493v3
- Date: Fri, 6 Sep 2024 11:28:04 GMT
- Title: EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
- Authors: Jiaxi Jiang, Paul Streli, Manuel Meier, Christian Holz,
- Abstract summary: EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset's field of view.
We introduce a novel global motion decomposition method that predicts full-body pose independent of global positions.
We experimentally evaluate our method and show that it outperforms state-of-the-art methods both qualitatively and quantitatively.
- Score: 29.795731025552957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Full-body egocentric pose estimation from head and hand poses alone has become an active area of research to power articulate avatar representations on headset-based platforms. However, existing methods over-rely on the indoor motion-capture spaces in which datasets were recorded, while simultaneously assuming continuous joint motion capture and uniform body dimensions. We propose EgoPoser to overcome these limitations with four main contributions. 1) EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset's field of view. 2) We rethink input representations for headset-based ego-pose estimation and introduce a novel global motion decomposition method that predicts full-body pose independent of global positions. 3) We enhance pose estimation by capturing longer motion time series through an efficient SlowFast module design that maintains computational efficiency. 4) EgoPoser generalizes across various body shapes for different users. We experimentally evaluate our method and show that it outperforms state-of-the-art methods both qualitatively and quantitatively while maintaining a high inference speed of over 600fps. EgoPoser establishes a robust baseline for future work where full-body pose estimation no longer needs to rely on outside-in capture and can scale to large-scale and unseen environments.
Related papers
- Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data [16.431101717478796]
Current methods for ego-body pose estimation rely on temporally dense sensor data.
We develop a two-stage approach that decomposes the problem into temporal completion and spatial completion.
arXiv Detail & Related papers (2024-11-05T23:53:19Z) - Estimating Body and Hand Motion in an Ego-sensed World [64.08911275906544]
We present EgoAllo, a system for human motion estimation from a head-mounted device.
Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
arXiv Detail & Related papers (2024-10-04T17:59:57Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR.
We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation.
This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision [72.36132924512299]
We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-01-20T00:45:13Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z) - Deep Reinforcement Learning for Active Human Pose Estimation [35.229529080763925]
We introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture.
We show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.
arXiv Detail & Related papers (2020-01-07T13:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.