Speech-Driven 3D Face Animation with Composite and Regional Facial
Movements
- URL: http://arxiv.org/abs/2308.05428v1
- Date: Thu, 10 Aug 2023 08:42:20 GMT
- Title: Speech-Driven 3D Face Animation with Composite and Regional Facial
Movements
- Authors: Haozhe Wu, Songtao Zhou, Jia Jia, Junliang Xing, Qi Wen, Xiang Wen
- Abstract summary: Speech-driven 3D face animation poses significant challenges due to the intricacy and variability inherent in human facial movements.
This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation.
- Score: 30.348768852726295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-driven 3D face animation poses significant challenges due to the
intricacy and variability inherent in human facial movements. This paper
emphasizes the importance of considering both the composite and regional
natures of facial movements in speech-driven 3D face animation. The composite
nature pertains to how speech-independent factors globally modulate
speech-driven facial movements along the temporal dimension. Meanwhile, the
regional nature alludes to the notion that facial movements are not globally
correlated but are actuated by local musculature along the spatial dimension.
It is thus indispensable to incorporate both natures for engendering vivid
animation. To address the composite nature, we introduce an adaptive modulation
module that employs arbitrary facial movements to dynamically adjust
speech-driven facial movements across frames on a global scale. To accommodate
the regional nature, our approach ensures that each constituent of the facial
features for every frame focuses on the local spatial movements of 3D faces.
Moreover, we present a non-autoregressive backbone for translating audio to 3D
facial movements, which maintains high-frequency nuances of facial movements
and facilitates efficient inference. Comprehensive experiments and user studies
demonstrate that our method surpasses contemporary state-of-the-art approaches
both qualitatively and quantitatively.
Related papers
- 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [22.30870274645442]
We present 3DiFACE, a novel method for personalized speech-driven 3D facial animation and editing.
Our method outperforms existing state-of-the-art techniques and yields speech-driven animations with greater fidelity and diversity.
arXiv Detail & Related papers (2023-12-01T19:01:05Z) - Breathing Life into Faces: Speech-driven 3D Facial Animation with
Natural Head Pose and Detailed Shape [19.431264557873117]
We introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation.
We explicitly disentangle facial animation into head pose and mouth movement and encode them separately.
We construct a new 3D dataset with detailed shapes and learn to synthesize facial details in line with speech content.
arXiv Detail & Related papers (2023-10-31T07:47:19Z) - DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis.
It captures the complex one-to-many relationships between speech and 3D face based on diffusion.
It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.
The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - Generating Holistic 3D Human Motion from Speech [97.11392166257791]
We build a high-quality dataset of 3D holistic body meshes with synchronous speech.
We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately.
arXiv Detail & Related papers (2022-12-08T17:25:19Z) - A Novel Speech-Driven Lip-Sync Model with CNN and LSTM [12.747541089354538]
We present a combined deep neural network of one-dimensional convolutions and LSTM to generate displacement of a 3D template face model from variable-length speech input.
In order to enhance the robustness of the network to different sound signals, we adapt a trained speech recognition model to extract speech feature.
We show that our model is able to generate smooth and natural lip movements synchronized with speech.
arXiv Detail & Related papers (2022-05-02T13:57:50Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.