Attention-Based VR Facial Animation with Visual Mouth Camera Guidance
for Immersive Telepresence Avatars
- URL: http://arxiv.org/abs/2312.09750v1
- Date: Fri, 15 Dec 2023 12:45:11 GMT
- Title: Attention-Based VR Facial Animation with Visual Mouth Camera Guidance
for Immersive Telepresence Avatars
- Authors: Andre Rochow, Max Schwarz, Sven Behnke
- Abstract summary: We present a hybrid method that uses both keypoints and direct visual guidance from a mouth camera.
Our method generalizes to unseen operators and requires only a quick enrolment step with capture of two short videos.
We highlight how the facial animation contributed to our victory at the ANA Avatar XPRIZE Finals.
- Score: 19.70403947793871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial animation in virtual reality environments is essential for
applications that necessitate clear visibility of the user's face and the
ability to convey emotional signals. In our scenario, we animate the face of an
operator who controls a robotic Avatar system. The use of facial animation is
particularly valuable when the perception of interacting with a specific
individual, rather than just a robot, is intended. Purely keypoint-driven
animation approaches struggle with the complexity of facial movements. We
present a hybrid method that uses both keypoints and direct visual guidance
from a mouth camera. Our method generalizes to unseen operators and requires
only a quick enrolment step with capture of two short videos. Multiple source
images are selected with the intention to cover different facial expressions.
Given a mouth camera frame from the HMD, we dynamically construct the target
keypoints and apply an attention mechanism to determine the importance of each
source image. To resolve keypoint ambiguities and animate a broader range of
mouth expressions, we propose to inject visual mouth camera information into
the latent space. We enable training on large-scale speaking head datasets by
simulating the mouth camera input with its perspective differences and facial
deformations. Our method outperforms a baseline in quality, capability, and
temporal consistency. In addition, we highlight how the facial animation
contributed to our victory at the ANA Avatar XPRIZE Finals.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - Universal Facial Encoding of Codec Avatars from VR Headsets [32.60236093340087]
We present a method that can animate a photorealistic avatar in realtime from head-mounted cameras (HMCs) on a consumer VR headset.
We present a lightweight expression calibration mechanism that increases accuracy with minimal additional cost to run-time efficiency.
arXiv Detail & Related papers (2024-07-17T22:08:15Z) - VR Facial Animation for Immersive Telepresence Avatars [25.506570225219406]
VR Facial Animation is necessary in applications requiring clear view of the face, even though a VR headset is worn.
We propose a real-time capable pipeline with very fast adaptation for specific operators.
We demonstrate an eye tracking pipeline that can be trained in less than a minute.
arXiv Detail & Related papers (2023-04-24T12:43:51Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation
Synthesis Using Self-Supervised Speech Representation Learning [0.0]
FaceXHuBERT is a text-less speech-driven 3D facial animation generation method.
It is very robust to background noise and can handle audio recorded in a variety of situations.
It produces superior results with respect to the realism of the animation 78% of the time.
arXiv Detail & Related papers (2023-03-09T17:05:19Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Identity-Preserving Realistic Talking Face Generation [4.848016645393023]
We propose a method for identity-preserving realistic facial animation from speech.
We impose eye blinks on facial landmarks using unsupervised learning.
We also use LSGAN to generate the facial texture from person-specific facial landmarks.
arXiv Detail & Related papers (2020-05-25T18:08:28Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.