Identity-Preserving Realistic Talking Face Generation
- URL: http://arxiv.org/abs/2005.12318v1
- Date: Mon, 25 May 2020 18:08:28 GMT
- Title: Identity-Preserving Realistic Talking Face Generation
- Authors: Sanjana Sinha, Sandika Biswas and Brojeshwar Bhowmick
- Abstract summary: We propose a method for identity-preserving realistic facial animation from speech.
We impose eye blinks on facial landmarks using unsupervised learning.
We also use LSGAN to generate the facial texture from person-specific facial landmarks.
- Score: 4.848016645393023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-driven facial animation is useful for a variety of applications such
as telepresence, chatbots, etc. The necessary attributes of having a realistic
face animation are 1) audio-visual synchronization (2) identity preservation of
the target individual (3) plausible mouth movements (4) presence of natural eye
blinks. The existing methods mostly address the audio-visual lip
synchronization, and few recent works have addressed the synthesis of natural
eye blinks for overall video realism. In this paper, we propose a method for
identity-preserving realistic facial animation from speech. We first generate
person-independent facial landmarks from audio using DeepSpeech features for
invariance to different voices, accents, etc. To add realism, we impose eye
blinks on facial landmarks using unsupervised learning and retargets the
person-independent landmarks to person-specific landmarks to preserve the
identity-related facial structure which helps in the generation of plausible
mouth shapes of the target identity. Finally, we use LSGAN to generate the
facial texture from person-specific facial landmarks, using an attention
mechanism that helps to preserve identity-related texture. An extensive
comparison of our proposed method with the current state-of-the-art methods
demonstrates a significant improvement in terms of lip synchronization
accuracy, image reconstruction quality, sharpness, and identity-preservation. A
user study also reveals improved realism of our animation results over the
state-of-the-art methods. To the best of our knowledge, this is the first work
in speech-driven 2D facial animation that simultaneously addresses all the
above-mentioned attributes of a realistic speech-driven face animation.
Related papers
- Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial
Animation [41.489700112318864]
Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style.
We introduce an innovative speaking style disentanglement method, which enables arbitrary-subject speaking style encoding.
We also propose a novel framework called textbfMimic to learn disentangled representations of the speaking style and content from facial motions.
arXiv Detail & Related papers (2023-12-18T01:49:42Z) - Personalized Speech-driven Expressive 3D Facial Animation Synthesis with
Style Control [1.8540152959438578]
A realistic facial animation system should consider such identity-specific speaking styles and facial idiosyncrasies to achieve high-degree of naturalness and plausibility.
We present a speech-driven expressive 3D facial animation synthesis framework that models identity specific facial motion as latent representations (called as styles)
Our framework is trained in an end-to-end fashion and has a non-autoregressive encoder-decoder architecture with three main components.
arXiv Detail & Related papers (2023-10-25T21:22:28Z) - DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis.
It captures the complex one-to-many relationships between speech and 3D face based on diffusion.
It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z) - Identity-Preserving Talking Face Generation with Landmark and Appearance
Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos.
We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures.
Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation
Synthesis Using Self-Supervised Speech Representation Learning [0.0]
FaceXHuBERT is a text-less speech-driven 3D facial animation generation method.
It is very robust to background noise and can handle audio recorded in a variety of situations.
It produces superior results with respect to the realism of the animation 78% of the time.
arXiv Detail & Related papers (2023-03-09T17:05:19Z) - Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention.
The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.