Learning Audio-Driven Viseme Dynamics for 3D Face Animation
- URL: http://arxiv.org/abs/2301.06059v1
- Date: Sun, 15 Jan 2023 09:55:46 GMT
- Title: Learning Audio-Driven Viseme Dynamics for 3D Face Animation
- Authors: Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen,
Xuefei Zhe, Di Kang
- Abstract summary: We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D animations from the input audio.
Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs.
- Score: 17.626644507523963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel audio-driven facial animation approach that can generate
realistic lip-synchronized 3D facial animations from the input audio. Our
approach learns viseme dynamics from speech videos, produces animator-friendly
viseme curves, and supports multilingual speech inputs. The core of our
approach is a novel parametric viseme fitting algorithm that utilizes phoneme
priors to extract viseme parameters from speech videos. With the guidance of
phonemes, the extracted viseme curves can better correlate with phonemes, thus
more controllable and friendly to animators. To support multilingual speech
inputs and generalizability to unseen voices, we take advantage of deep audio
feature models pretrained on multiple languages to learn the mapping from audio
to viseme curves. Our audio-to-curves mapping achieves state-of-the-art
performance even when the input audio suffers from distortions of volume,
pitch, speed, or noise. Lastly, a viseme scanning approach for acquiring
high-fidelity viseme assets is presented for efficient speech animation
production. We show that the predicted viseme curves can be applied to
different viseme-rigged characters to yield various personalized animations
with realistic and natural facial motions. Our approach is artist-friendly and
can be easily integrated into typical animation production workflows including
blendshape or bone based animation.
Related papers
- LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement [8.973545189395953]
This study focuses on the creation of visually compelling, time-synchronized animations through diffusion-based techniques.
We process audio features separately and derive the corresponding control gates, which implicitly govern the movements in the mouth, eyes, and head, irrespective of the portrait's origin.
The significant improvements in the fidelity of animated portraits, the accuracy of lip-syncing, and the appropriate motion variations achieved by our method render it a versatile tool for animating any portrait in any language.
arXiv Detail & Related papers (2024-07-26T08:30:06Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial
Animation [41.489700112318864]
Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style.
We introduce an innovative speaking style disentanglement method, which enables arbitrary-subject speaking style encoding.
We also propose a novel framework called textbfMimic to learn disentangled representations of the speaking style and content from facial motions.
arXiv Detail & Related papers (2023-12-18T01:49:42Z) - 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [22.30870274645442]
We present 3DiFACE, a novel method for personalized speech-driven 3D facial animation and editing.
Our method outperforms existing state-of-the-art techniques and yields speech-driven animations with greater fidelity and diversity.
arXiv Detail & Related papers (2023-12-01T19:01:05Z) - FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation
Synthesis Using Self-Supervised Speech Representation Learning [0.0]
FaceXHuBERT is a text-less speech-driven 3D facial animation generation method.
It is very robust to background noise and can handle audio recorded in a variety of situations.
It produces superior results with respect to the realism of the animation 78% of the time.
arXiv Detail & Related papers (2023-03-09T17:05:19Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.