Breathing Life into Faces: Speech-driven 3D Facial Animation with
Natural Head Pose and Detailed Shape
- URL: http://arxiv.org/abs/2310.20240v1
- Date: Tue, 31 Oct 2023 07:47:19 GMT
- Title: Breathing Life into Faces: Speech-driven 3D Facial Animation with
Natural Head Pose and Detailed Shape
- Authors: Wei Zhao, Yijun Wang, Tianyu He, Lianying Yin, Jianxin Lin, Xin Jin
- Abstract summary: We introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation.
We explicitly disentangle facial animation into head pose and mouth movement and encode them separately.
We construct a new 3D dataset with detailed shapes and learn to synthesize facial details in line with speech content.
- Score: 19.431264557873117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The creation of lifelike speech-driven 3D facial animation requires a natural
and precise synchronization between audio input and facial expressions.
However, existing works still fail to render shapes with flexible head poses
and natural facial details (e.g., wrinkles). This limitation is mainly due to
two aspects: 1) Collecting training set with detailed 3D facial shapes is
highly expensive. This scarcity of detailed shape annotations hinders the
training of models with expressive facial animation. 2) Compared to mouth
movement, the head pose is much less correlated to speech content.
Consequently, concurrent modeling of both mouth movement and head pose yields
the lack of facial movement controllability. To address these challenges, we
introduce VividTalker, a new framework designed to facilitate speech-driven 3D
facial animation characterized by flexible head pose and natural facial
details. Specifically, we explicitly disentangle facial animation into head
pose and mouth movement and encode them separately into discrete latent spaces.
Then, these attributes are generated through an autoregressive process
leveraging a window-based Transformer architecture. To augment the richness of
3D facial animation, we construct a new 3D dataset with detailed shapes and
learn to synthesize facial details in line with speech content. Extensive
quantitative and qualitative experiments demonstrate that VividTalker
outperforms state-of-the-art methods, resulting in vivid and realistic
speech-driven 3D facial animation.
Related papers
- MMHead: Towards Fine-grained Multi-modal 3D Facial Animation [68.04052669266174]
We construct a large-scale multi-modal 3D facial animation dataset, MMHead.
MMHead consists of 49 hours of 3D facial motion sequences, speech audios, and rich hierarchical text annotations.
Based on the MMHead dataset, we establish benchmarks for two new tasks: text-induced 3D talking head animation and text-to-3D facial motion generation.
arXiv Detail & Related papers (2024-10-10T09:37:01Z) - Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial
Animation [41.489700112318864]
Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style.
We introduce an innovative speaking style disentanglement method, which enables arbitrary-subject speaking style encoding.
We also propose a novel framework called textbfMimic to learn disentangled representations of the speaking style and content from facial motions.
arXiv Detail & Related papers (2023-12-18T01:49:42Z) - 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [22.30870274645442]
We present 3DiFACE, a novel method for personalized speech-driven 3D facial animation and editing.
Our method outperforms existing state-of-the-art techniques and yields speech-driven animations with greater fidelity and diversity.
arXiv Detail & Related papers (2023-12-01T19:01:05Z) - AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation [49.4220768835379]
AdaMesh is a novel adaptive speech-driven facial animation approach.
It learns the personalized talking style from a reference video of about 10 seconds.
It generates vivid facial expressions and head poses.
arXiv Detail & Related papers (2023-10-11T06:56:08Z) - DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis.
It captures the complex one-to-many relationships between speech and 3D face based on diffusion.
It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.