Estimating Body and Hand Motion in an Ego-sensed World
- URL: http://arxiv.org/abs/2410.03665v2
- Date: Thu, 17 Oct 2024 20:51:19 GMT
- Title: Estimating Body and Hand Motion in an Ego-sensed World
- Authors: Brent Yi, Vickie Ye, Maya Zheng, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa,
- Abstract summary: We present EgoAllo, a system for human motion estimation from a head-mounted device.
Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
- Score: 64.08911275906544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture the wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve the hands: the resulting kinematic and temporal constraints result in over 40% lower hand estimation errors compared to noisy monocular estimates. Project page: https://egoallo.github.io/
Related papers
- Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data [16.431101717478796]
Current methods for ego-body pose estimation rely on temporally dense sensor data.
We develop a two-stage approach that decomposes the problem into temporal completion and spatial completion.
arXiv Detail & Related papers (2024-11-05T23:53:19Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement [65.08165593201437]
We explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion.
This task presents significant challenges due to the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion.
We propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction.
arXiv Detail & Related papers (2023-11-28T07:13:47Z) - EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere [29.795731025552957]
EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset's field of view.
We introduce a novel global motion decomposition method that predicts full-body pose independent of global positions.
We experimentally evaluate our method and show that it outperforms state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-08-12T07:46:50Z) - Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR.
We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation.
This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision [72.36132924512299]
We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-01-20T00:45:13Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.