Estimating Body and Hand Motion in an Ego-sensed World
- URL: http://arxiv.org/abs/2410.03665v3
- Date: Tue, 17 Dec 2024 18:39:00 GMT
- Title: Estimating Body and Hand Motion in an Ego-sensed World
- Authors: Brent Yi, Vickie Ye, Maya Zheng, Yunqi Li, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa,
- Abstract summary: We present EgoAllo, a system for human motion estimation from a head-mounted device.
Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
- Score: 62.61989004520802
- License:
- Abstract: We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture a device wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve hand estimation: the resulting kinematic and temporal constraints can reduce world-frame errors in single-frame estimates by 40%. Project page: https://egoallo.github.io/
Related papers
- Reconstructing People, Places, and Cameras [57.81696692335401]
"Humans and Structure from Motion" (HSfM) is a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system.
Our results show that incorporating human data into the SfM pipeline improves camera pose estimation.
arXiv Detail & Related papers (2024-12-23T18:58:34Z) - Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data [16.431101717478796]
Current methods for ego-body pose estimation rely on temporally dense sensor data.
We develop a two-stage approach that decomposes the problem into temporal completion and spatial completion.
arXiv Detail & Related papers (2024-11-05T23:53:19Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Enhanced Spatio-Temporal Context for Temporally Consistent Robust 3D
Human Motion Recovery from Monocular Videos [5.258814754543826]
We propose a novel method for temporally consistent motion estimation from a monocular video.
Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose.
Our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2023-11-20T10:53:59Z) - EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere [29.795731025552957]
EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset's field of view.
We introduce a novel global motion decomposition method that predicts full-body pose independent of global positions.
We experimentally evaluate our method and show that it outperforms state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-08-12T07:46:50Z) - Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR.
We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation.
This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - Neural Monocular 3D Human Motion Capture with Physical Awareness [76.55971509794598]
We present a new trainable system for physically plausible markerless 3D human motion capture.
Unlike most neural methods for human motion capture, our approach is aware of physical and environmental constraints.
It produces smooth and physically principled 3D motions in an interactive frame rate in a wide variety of challenging scenes.
arXiv Detail & Related papers (2021-05-03T17:57:07Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.