Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations
- URL: http://arxiv.org/abs/2304.08945v1
- Date: Tue, 18 Apr 2023 12:36:15 GMT
- Title: Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations
- Authors: Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Xiaoqin Zhang,
Shijian Lu
- Abstract summary: DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
- Score: 61.65012981435094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio-driven talking face generation, which aims to synthesize talking faces
with realistic facial animations (including accurate lip movements, vivid
facial expression details and natural head poses) corresponding to the audio,
has achieved rapid progress in recent years. However, most existing work
focuses on generating lip movements only without handling the closely
correlated facial expressions, which degrades the realism of the generated
faces greatly. This paper presents DIRFA, a novel method that can generate
talking faces with diverse yet realistic facial animations from the same
driving audio. To accommodate fair variation of plausible facial animations for
the same audio, we design a transformer-based probabilistic mapping network
that can model the variational facial animation distribution conditioned upon
the input audio and autoregressively convert the audio signals into a facial
animation sequence. In addition, we introduce a temporally-biased mask into the
mapping network, which allows to model the temporal dependency of facial
animations and produce temporally smooth facial animation sequence. With the
generated facial animation sequence and a source image, photo-realistic talking
faces can be synthesized with a generic generation network. Extensive
experiments show that DIRFA can generate talking faces with realistic facial
animations effectively.
Related papers
- SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation [13.459396544300137]
We propose a novel one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from the general Talking Face Generation.
We introduce Inter-Reconstructed Feature Disentanglement (IRFD) module to decouple facial features into three latent spaces.
We then design a face editing module that modifies speech content and facial latent codes into a single latent space.
arXiv Detail & Related papers (2024-05-12T11:41:44Z) - Breathing Life into Faces: Speech-driven 3D Facial Animation with
Natural Head Pose and Detailed Shape [19.431264557873117]
We introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation.
We explicitly disentangle facial animation into head pose and mouth movement and encode them separately.
We construct a new 3D dataset with detailed shapes and learn to synthesize facial details in line with speech content.
arXiv Detail & Related papers (2023-10-31T07:47:19Z) - DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis.
It captures the complex one-to-many relationships between speech and 3D face based on diffusion.
It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z) - Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.
The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Identity-Preserving Realistic Talking Face Generation [4.848016645393023]
We propose a method for identity-preserving realistic facial animation from speech.
We impose eye blinks on facial landmarks using unsupervised learning.
We also use LSGAN to generate the facial texture from person-specific facial landmarks.
arXiv Detail & Related papers (2020-05-25T18:08:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.