Emotionally Enhanced Talking Face Generation
- URL: http://arxiv.org/abs/2303.11548v2
- Date: Sun, 26 Mar 2023 04:41:50 GMT
- Title: Emotionally Enhanced Talking Face Generation
- Authors: Sahil Goyal, Shagun Uppal, Sarthak Bhagat, Yi Yu, Yifang Yin, Rajiv
Ratn Shah
- Abstract summary: We build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions.
We show that our model can adapt to arbitrary identities, emotions, and languages.
Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions.
- Score: 52.07451348895041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several works have developed end-to-end pipelines for generating lip-synced
talking faces with various real-world applications, such as teaching and
language translation in videos. However, these prior works fail to create
realistic-looking videos since they focus little on people's expressions and
emotions. Moreover, these methods' effectiveness largely depends on the faces
in the training dataset, which means they may not perform well on unseen faces.
To mitigate this, we build a talking face generation framework conditioned on a
categorical emotion to generate videos with appropriate expressions, making
them more realistic and convincing. With a broad range of six emotions, i.e.,
\emph{happiness}, \emph{sadness}, \emph{fear}, \emph{anger}, \emph{disgust},
and \emph{neutral}, we show that our model can adapt to arbitrary identities,
emotions, and languages. Our proposed framework is equipped with a
user-friendly web interface with a real-time experience for talking face
generation with emotions. We also conduct a user study for subjective
evaluation of our interface's usability, design, and functionality. Project
page: https://midas.iiitd.edu.in/emo/
Related papers
- DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation [14.07086606183356]
Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications.
Current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion.
We introduce DEEPTalk, a novel approach that generates diverse and emotionally rich 3D facial expressions directly from speech inputs.
arXiv Detail & Related papers (2024-08-12T08:56:49Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation [13.459396544300137]
We propose a novel one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from the general Talking Face Generation.
We introduce Inter-Reconstructed Feature Disentanglement (IRFD) module to decouple facial features into three latent spaces.
We then design a face editing module that modifies speech content and facial latent codes into a single latent space.
arXiv Detail & Related papers (2024-05-12T11:41:44Z) - EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face
Generation [34.5592743467339]
We propose a visual attribute-guided audio decoupler to generate fine-grained facial animations.
To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module.
Our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization.
arXiv Detail & Related papers (2024-02-02T14:04:18Z) - GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained
3D Face Guidance [83.43852715997596]
GSmoothFace is a novel two-stage generalized talking face generation model guided by a fine-grained 3d face model.
It can synthesize smooth lip dynamics while preserving the speaker's identity.
Both quantitative and qualitative experiments confirm the superiority of our method in terms of realism, lip synchronization, and visual quality.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - ChatAnything: Facetime Chat with LLM-Enhanced Personas [87.76804680223003]
We propose the mixture of voices (MoV) and the mixture of diffusers (MoD) for diverse voice and appearance generation.
For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones.
MoD, we combine the recent popular text-to-image generation techniques and talking head algorithms to streamline the process of generating talking objects.
arXiv Detail & Related papers (2023-11-12T08:29:41Z) - AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation [49.4220768835379]
AdaMesh is a novel adaptive speech-driven facial animation approach.
It learns the personalized talking style from a reference video of about 10 seconds.
It generates vivid facial expressions and head poses.
arXiv Detail & Related papers (2023-10-11T06:56:08Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - High-fidelity Generalized Emotional Talking Face Generation with
Multi-modal Emotion Space Learning [43.09015109281053]
We propose a more flexible and generalized framework for talking face generation.
Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space.
An Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation.
arXiv Detail & Related papers (2023-05-04T05:59:34Z) - Emotion-Controllable Generalized Talking Face Generation [6.22276955954213]
We propose a one-shot facial geometry-aware emotional talking face generation method.
Our method can adapt to arbitrary faces captured in-the-wild by fine-tuning with only a single image of the target identity in neutral emotion.
arXiv Detail & Related papers (2022-05-02T18:41:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.