Imitator: Personalized Speech-driven 3D Facial Animation
- URL: http://arxiv.org/abs/2301.00023v1
- Date: Fri, 30 Dec 2022 19:00:02 GMT
- Title: Imitator: Personalized Speech-driven 3D Facial Animation
- Authors: Balamurugan Thambiraja, Ikhsanul Habibie, Sadegh Aliakbarian, Darren
Cosker, Christian Theobalt, Justus Thies
- Abstract summary: State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
- Score: 63.57811510502906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-driven 3D facial animation has been widely explored, with applications
in gaming, character animation, virtual reality, and telepresence systems.
State-of-the-art methods deform the face topology of the target actor to sync
the input audio without considering the identity-specific speaking style and
facial idiosyncrasies of the target actor, thus, resulting in unrealistic and
inaccurate lip movements. To address this, we present Imitator, a speech-driven
facial expression synthesis method, which learns identity-specific details from
a short input video and produces novel facial expressions matching the
identity-specific speaking style and facial idiosyncrasies of the target actor.
Specifically, we train a style-agnostic transformer on a large facial
expression dataset which we use as a prior for audio-driven facial expressions.
Based on this prior, we optimize for identity-specific speaking style based on
a short reference video. To train the prior, we introduce a novel loss function
based on detected bilabial consonants to ensure plausible lip closures and
consequently improve the realism of the generated expressions. Through detailed
experiments and a user study, we show that our approach produces temporally
coherent facial expressions from input audio while preserving the speaking
style of the target actors.
Related papers
- MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes [74.82911268630463]
Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos.
MimicTalk exploits the rich knowledge from a NeRF-based person-agnostic generic model for improving the efficiency and robustness of personalized TFG.
Experiments show that our MimicTalk surpasses previous baselines regarding video quality, efficiency, and expressiveness.
arXiv Detail & Related papers (2024-10-09T10:12:37Z) - Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial
Animation [41.489700112318864]
Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style.
We introduce an innovative speaking style disentanglement method, which enables arbitrary-subject speaking style encoding.
We also propose a novel framework called textbfMimic to learn disentangled representations of the speaking style and content from facial motions.
arXiv Detail & Related papers (2023-12-18T01:49:42Z) - Personalized Speech-driven Expressive 3D Facial Animation Synthesis with
Style Control [1.8540152959438578]
A realistic facial animation system should consider such identity-specific speaking styles and facial idiosyncrasies to achieve high-degree of naturalness and plausibility.
We present a speech-driven expressive 3D facial animation synthesis framework that models identity specific facial motion as latent representations (called as styles)
Our framework is trained in an end-to-end fashion and has a non-autoregressive encoder-decoder architecture with three main components.
arXiv Detail & Related papers (2023-10-25T21:22:28Z) - AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation [49.4220768835379]
AdaMesh is a novel adaptive speech-driven facial animation approach.
It learns the personalized talking style from a reference video of about 10 seconds.
It generates vivid facial expressions and head poses.
arXiv Detail & Related papers (2023-10-11T06:56:08Z) - Identity-Preserving Talking Face Generation with Landmark and Appearance
Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos.
We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures.
Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation
Synthesis Using Self-Supervised Speech Representation Learning [0.0]
FaceXHuBERT is a text-less speech-driven 3D facial animation generation method.
It is very robust to background noise and can handle audio recorded in a variety of situations.
It produces superior results with respect to the realism of the animation 78% of the time.
arXiv Detail & Related papers (2023-03-09T17:05:19Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Identity-Preserving Realistic Talking Face Generation [4.848016645393023]
We propose a method for identity-preserving realistic facial animation from speech.
We impose eye blinks on facial landmarks using unsupervised learning.
We also use LSGAN to generate the facial texture from person-specific facial landmarks.
arXiv Detail & Related papers (2020-05-25T18:08:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.