EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars
- URL: http://arxiv.org/abs/2410.01835v2
- Date: Tue, 8 Oct 2024 23:01:58 GMT
- Title: EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars
- Authors: Jianchun Chen, Jian Wang, Yinda Zhang, Rohit Pandey, Thabo Beeler, Marc Habermann, Christian Theobalt,
- Abstract summary: We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
- Score: 56.56236652774294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Immersive VR telepresence ideally means being able to interact and communicate with digital avatars that are indistinguishable from and precisely reflect the behaviour of their real counterparts. The core technical challenge is two fold: Creating a digital double that faithfully reflects the real human and tracking the real human solely from egocentric sensing devices that are lightweight and have a low energy consumption, e.g. a single RGB camera. Up to date, no unified solution to this problem exists as recent works solely focus on egocentric motion capture, only model the head, or build avatars from multi-view captures. In this work, we, for the first time in literature, propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video. We first present a character model that is animatible, i.e. can be solely driven by skeletal motion, while being capable of modeling geometry and appearance. Then, we introduce a personalized egocentric motion capture component, which recovers full-body motion from an egocentric video. Finally, we apply the recovered pose to our character model and perform a test-time mesh refinement such that the geometry faithfully projects onto the egocentric view. To validate our design choices, we propose a new and challenging benchmark, which provides paired egocentric and dense multi-view videos of real humans performing various motions. Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods. For more details, code, and data, we refer to our project page.
Related papers
- EgoGen: An Egocentric Synthetic Data Generator [53.32942235801499]
EgoGen is a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.
At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment.
We demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views.
arXiv Detail & Related papers (2024-01-16T18:55:22Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Physics-based Motion Retargeting from Sparse Inputs [73.94570049637717]
Commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose.
We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies.
We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available.
arXiv Detail & Related papers (2023-07-04T21:57:05Z) - EgoHumans: An Egocentric 3D Multi-Human Benchmark [37.375846688453514]
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking.
We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild.
We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing tennis, fencing, volleyball, etc.
arXiv Detail & Related papers (2023-05-25T21:37:36Z) - Ego-Body Pose Estimation via Ego-Head Pose Estimation [22.08240141115053]
Estimating 3D human motion from an egocentric video sequence plays a critical role in human behavior understanding and has various applications in VR/AR.
We propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation.
This disentanglement of head and body pose eliminates the need for training datasets with paired egocentric videos and 3D human motion.
arXiv Detail & Related papers (2022-12-09T02:25:20Z) - UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture [70.59984501516084]
UnrealEgo is a new large-scale naturalistic dataset for egocentric 3D human pose estimation.
It is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments.
We propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation.
arXiv Detail & Related papers (2022-08-02T17:59:54Z) - EgoRenderer: Rendering Human Avatars from Egocentric Camera Images [87.96474006263692]
We present EgoRenderer, a system for rendering full-body neural avatars of a person captured by a wearable, egocentric fisheye camera.
Rendering full-body avatars from such egocentric images come with unique challenges due to the top-down view and large distortions.
We tackle these challenges by decomposing the rendering process into several steps, including texture synthesis, pose construction, and neural image translation.
arXiv Detail & Related papers (2021-11-24T18:33:02Z) - 4D Human Body Capture from Egocentric Video via 3D Scene Grounding [38.3169520384642]
We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos.
The unique viewpoint and rapid embodied camera motion of egocentric videos raise additional technical barriers for human body capture.
arXiv Detail & Related papers (2020-11-26T15:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.