EgoHumans: An Egocentric 3D Multi-Human Benchmark
- URL: http://arxiv.org/abs/2305.16487v2
- Date: Fri, 18 Aug 2023 23:28:45 GMT
- Title: EgoHumans: An Egocentric 3D Multi-Human Benchmark
- Authors: Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo,
Kris Kitani
- Abstract summary: We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild.
We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing tennis, fencing, volleyball, etc.
- Score: 37.375846688453514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present EgoHumans, a new multi-view multi-human video benchmark to advance
the state-of-the-art of egocentric human 3D pose estimation and tracking.
Existing egocentric benchmarks either capture single subject or indoor-only
scenarios, which limit the generalization of computer vision algorithms for
real-world applications. We propose a novel 3D capture setup to construct a
comprehensive egocentric multi-human benchmark in the wild with annotations to
support diverse tasks such as human detection, tracking, 2D/3D pose estimation,
and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses
for the egocentric view, which enables us to capture dynamic activities like
playing tennis, fencing, volleyball, etc. Furthermore, our multi-view setup
generates accurate 3D ground truth even under severe or complete occlusion. The
dataset consists of more than 125k egocentric images, spanning diverse scenes
with a particular focus on challenging and unchoreographed multi-human
activities and fast-moving egocentric views. We rigorously evaluate existing
state-of-the-art methods and highlight their limitations in the egocentric
scenario, specifically on multi-human tracking. To address such limitations, we
propose EgoFormer, a novel approach with a multi-stream transformer
architecture and explicit 3D spatial reasoning to estimate and track the human
pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 on the
EgoHumans dataset.
Related papers
- Ego3DT: Tracking Every 3D Object in Ego-centric Videos [20.96550148331019]
This paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.
We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment.
We have also innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos.
arXiv Detail & Related papers (2024-10-11T05:02:31Z) - EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - EgoGen: An Egocentric Synthetic Data Generator [53.32942235801499]
EgoGen is a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.
At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment.
We demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views.
arXiv Detail & Related papers (2024-01-16T18:55:22Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Scene-aware Egocentric 3D Human Pose Estimation [72.57527706631964]
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality.
Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene.
We propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints.
arXiv Detail & Related papers (2022-12-20T21:35:39Z) - Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR.
We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation.
This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z) - UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture [70.59984501516084]
UnrealEgo is a new large-scale naturalistic dataset for egocentric 3D human pose estimation.
It is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments.
We propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation.
arXiv Detail & Related papers (2022-08-02T17:59:54Z) - Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision [72.36132924512299]
We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-01-20T00:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.