Emotion-Controllable Generalized Talking Face Generation
- URL: http://arxiv.org/abs/2205.01155v1
- Date: Mon, 2 May 2022 18:41:36 GMT
- Title: Emotion-Controllable Generalized Talking Face Generation
- Authors: Sanjana Sinha, Sandika Biswas, Ravindra Yadav and Brojeshwar Bhowmick
- Abstract summary: We propose a one-shot facial geometry-aware emotional talking face generation method.
Our method can adapt to arbitrary faces captured in-the-wild by fine-tuning with only a single image of the target identity in neutral emotion.
- Score: 6.22276955954213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the significant progress in recent years, very few of the AI-based
talking face generation methods attempt to render natural emotions. Moreover,
the scope of the methods is majorly limited to the characteristics of the
training dataset, hence they fail to generalize to arbitrary unseen faces. In
this paper, we propose a one-shot facial geometry-aware emotional talking face
generation method that can generalize to arbitrary faces. We propose a graph
convolutional neural network that uses speech content feature, along with an
independent emotion input to generate emotion and speech-induced motion on
facial geometry-aware landmark representation. This representation is further
used in our optical flow-guided texture generation network for producing the
texture. We propose a two-branch texture generation network, with motion and
texture branches designed to consider the motion and texture content
independently. Compared to the previous emotion talking face methods, our
method can adapt to arbitrary faces captured in-the-wild by fine-tuning with
only a single image of the target identity in neutral emotion.
Related papers
- Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels.
We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations.
We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [12.044308738509402]
We propose a two-stage audio-driven talking face generation framework that employs 3D facial landmarks as intermediate variables.
This framework achieves collaborative alignment of expression, gaze, and pose with emotions through self-supervised learning.
Our model significantly advances the state-of-the-art performance in both visual quality and emotional alignment.
arXiv Detail & Related papers (2024-06-12T06:00:00Z) - CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [13.27632316528572]
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations.
Main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions.
This paper proposes a method called CSTalk that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions.
arXiv Detail & Related papers (2024-04-29T11:19:15Z) - FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization [4.429892245774265]
This paper proposes using normalizing Flow and Vector-Quantization modeling to produce emotional talking faces.
Specifically, we develop a flow-based coefficient generator that encodes the dynamics of facial emotion into a multi-emotion-class latent space.
Our designed vector-quantization image generator treats the creation of expressive facial images as a code query task.
arXiv Detail & Related papers (2024-03-11T01:58:04Z) - High-fidelity Generalized Emotional Talking Face Generation with
Multi-modal Emotion Space Learning [43.09015109281053]
We propose a more flexible and generalized framework for talking face generation.
Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space.
An Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation.
arXiv Detail & Related papers (2023-05-04T05:59:34Z) - Emotionally Enhanced Talking Face Generation [52.07451348895041]
We build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions.
We show that our model can adapt to arbitrary identities, emotions, and languages.
Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions.
arXiv Detail & Related papers (2023-03-21T02:33:27Z) - Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.
On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
arXiv Detail & Related papers (2022-04-24T15:58:35Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.