Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual
Reality
- URL: http://arxiv.org/abs/2104.04794v1
- Date: Sat, 10 Apr 2021 15:48:53 GMT
- Title: Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual
Reality
- Authors: Amin Jourabloo, Fernando De la Torre, Jason Saragih, Shih-En Wei,
Te-Li Wang, Stephen Lombardi, Danielle Belko, Autumn Trimble, Hernan Badino
- Abstract summary: Social presence will fuel the next generation of communication systems driven by digital humans in virtual reality (VR)
The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models.
This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture.
- Score: 68.18446501943585
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social presence, the feeling of being there with a real person, will fuel the
next generation of communication systems driven by digital humans in virtual
reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny
effect rely on person-specific (PS) models. However, these PS models are
time-consuming to build and are typically trained with limited data
variability, which results in poor generalization and robustness. Major sources
of variability that affects the accuracy of facial expression transfer
algorithms include using different VR headsets (e.g., camera configuration,
slop of the headset), facial appearance changes over time (e.g., beard,
make-up), and environmental factors (e.g., lighting, backgrounds). This is a
major drawback for the scalability of these models in VR. This paper makes
progress in overcoming these limitations by proposing an end-to-end
multi-identity architecture (MIA) trained with specialized augmentation
strategies. MIA drives the shape component of the avatar from three cameras in
the VR headset (two eyes, one mouth), in untrained subjects, using minimal
personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS
texture decoder is available, MIA is able to drive the full avatar
(shape+texture) robustly outperforming PS models in challenging scenarios. Our
key contribution to improve robustness and generalization, is that our method
implicitly decouples, in an unsupervised manner, the facial expression from
nuisance factors (e.g., headset, environment, facial appearance). We
demonstrate the superior performance and robustness of the proposed method
versus state-of-the-art PS approaches in a variety of experiments.
Related papers
- Universal Facial Encoding of Codec Avatars from VR Headsets [32.60236093340087]
We present a method that can animate a photorealistic avatar in realtime from head-mounted cameras (HMCs) on a consumer VR headset.
We present a lightweight expression calibration mechanism that increases accuracy with minimal additional cost to run-time efficiency.
arXiv Detail & Related papers (2024-07-17T22:08:15Z) - GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar [48.21353924040671]
We propose to learn person-specific animatable avatars from images without assuming to have access to precise facial expression tracking.
We learn a mapping from 3DMM facial expression parameters to the latent space of the generative model.
With this scheme, we decouple 3D appearance reconstruction and animation control to achieve high fidelity in image synthesis.
arXiv Detail & Related papers (2023-11-22T19:13:00Z) - Pixel Codec Avatars [99.36561532588831]
Pixel Codec Avatars (PiCA) is a deep generative model of 3D human faces.
On a single Oculus Quest 2 mobile VR headset, 5 avatars are rendered in realtime in the same scene.
arXiv Detail & Related papers (2021-04-09T23:17:36Z) - High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation [117.32310997522394]
3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR.
Existing person-specific 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar.
This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.
arXiv Detail & Related papers (2021-03-29T18:33:49Z) - Unmasking Communication Partners: A Low-Cost AI Solution for Digitally
Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD)
Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware.
We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z) - Expressive Telepresence via Modular Codec Avatars [148.212743312768]
VR telepresence consists of interacting with another human in a virtual space represented by an avatar.
This paper aims in this direction and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset.
MCA extends traditional Codec Avatars (CA) by replacing the holistic models with a learned modular representation.
arXiv Detail & Related papers (2020-08-26T20:16:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.