Related papers: Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

URL: http://arxiv.org/abs/2304.08945v1
Date: Tue, 18 Apr 2023 12:36:15 GMT
Title: Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations
Authors: Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Xiaoqin Zhang, Shijian Lu
Abstract summary: DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio. To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network. We show that DIRFA can generate talking faces with realistic facial animations effectively.
Score: 61.65012981435094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Audio-driven talking face generation, which aims to synthesize talking faces with realistic facial animations (including accurate lip movements, vivid facial expression details and natural head poses) corresponding to the audio, has achieved rapid progress in recent years. However, most existing work focuses on generating lip movements only without handling the closely correlated facial expressions, which degrades the realism of the generated faces greatly. This paper presents DIRFA, a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio. To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network that can model the variational facial animation distribution conditioned upon the input audio and autoregressively convert the audio signals into a facial animation sequence. In addition, we introduce a temporally-biased mask into the mapping network, which allows to model the temporal dependency of facial animations and produce temporally smooth facial animation sequence. With the generated facial animation sequence and a source image, photo-realistic talking faces can be synthesized with a generic generation network. Extensive experiments show that DIRFA can generate talking faces with realistic facial animations effectively.

Related papers

EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models [66.67979602235015]
EmoDiffusion is a novel approach that disentangles different emotions in speech to generate rich 3D emotional facial expressions. We capture facial expressions under the guidance of animation experts using LiveLinkFace on an iPhone.
arXiv Detail & Related papers (2025-03-14T02:54:22Z)
SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation [13.459396544300137]
We propose a novel one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from the general Talking Face Generation. We introduce Inter-Reconstructed Feature Disentanglement (IRFD) module to decouple facial features into three latent spaces. We then design a face editing module that modifies speech content and facial latent codes into a single latent space.
arXiv Detail & Related papers (2024-05-12T11:41:44Z)
Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape [19.431264557873117]
We introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation. We explicitly disentangle facial animation into head pose and mouth movement and encode them separately. We construct a new 3D dataset with detailed shapes and learn to synthesize facial details in line with speech content.
arXiv Detail & Related papers (2023-10-31T07:47:19Z)
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. It captures the complex one-to-many relationships between speech and 3D face based on diffusion. It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z)
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention. The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z)
Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor. We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video. We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z)
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z)
Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking. Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z)
Identity-Preserving Realistic Talking Face Generation [4.848016645393023]
We propose a method for identity-preserving realistic facial animation from speech. We impose eye blinks on facial landmarks using unsupervised learning. We also use LSGAN to generate the facial texture from person-specific facial landmarks.
arXiv Detail & Related papers (2020-05-25T18:08:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.