Emotion-Controllable Generalized Talking Face Generation
- URL: http://arxiv.org/abs/2205.01155v1
- Date: Mon, 2 May 2022 18:41:36 GMT
- Title: Emotion-Controllable Generalized Talking Face Generation
- Authors: Sanjana Sinha, Sandika Biswas, Ravindra Yadav and Brojeshwar Bhowmick
- Abstract summary: We propose a one-shot facial geometry-aware emotional talking face generation method.
Our method can adapt to arbitrary faces captured in-the-wild by fine-tuning with only a single image of the target identity in neutral emotion.
- Score: 6.22276955954213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the significant progress in recent years, very few of the AI-based
talking face generation methods attempt to render natural emotions. Moreover,
the scope of the methods is majorly limited to the characteristics of the
training dataset, hence they fail to generalize to arbitrary unseen faces. In
this paper, we propose a one-shot facial geometry-aware emotional talking face
generation method that can generalize to arbitrary faces. We propose a graph
convolutional neural network that uses speech content feature, along with an
independent emotion input to generate emotion and speech-induced motion on
facial geometry-aware landmark representation. This representation is further
used in our optical flow-guided texture generation network for producing the
texture. We propose a two-branch texture generation network, with motion and
texture branches designed to consider the motion and texture content
independently. Compared to the previous emotion talking face methods, our
method can adapt to arbitrary faces captured in-the-wild by fine-tuning with
only a single image of the target identity in neutral emotion.
Related papers
- Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [12.044308738509402]
We propose a two-stage audio-driven talking face generation framework that employs 3D facial landmarks as intermediate variables.
This framework achieves collaborative alignment of expression, gaze, and pose with emotions through self-supervised learning.
Our model significantly advances the state-of-the-art performance in both visual quality and emotional alignment.
arXiv Detail & Related papers (2024-06-12T06:00:00Z) - CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [13.27632316528572]
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations.
Main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions.
This paper proposes a method called CSTalk that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions.
arXiv Detail & Related papers (2024-04-29T11:19:15Z) - FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization [4.429892245774265]
This paper proposes using normalizing Flow and Vector-Quantization modeling to produce emotional talking faces.
Specifically, we develop a flow-based coefficient generator that encodes the dynamics of facial emotion into a multi-emotion-class latent space.
Our designed vector-quantization image generator treats the creation of expressive facial images as a code query task.
arXiv Detail & Related papers (2024-03-11T01:58:04Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - High-fidelity Generalized Emotional Talking Face Generation with
Multi-modal Emotion Space Learning [43.09015109281053]
We propose a more flexible and generalized framework for talking face generation.
Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space.
An Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation.
arXiv Detail & Related papers (2023-05-04T05:59:34Z) - Emotionally Enhanced Talking Face Generation [52.07451348895041]
We build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions.
We show that our model can adapt to arbitrary identities, emotions, and languages.
Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions.
arXiv Detail & Related papers (2023-03-21T02:33:27Z) - PERI: Part Aware Emotion Recognition In The Wild [4.206175795966693]
This paper focuses on emotion recognition using visual features.
We create part aware spatial (PAS) images by extracting key regions from the input image using a mask generated from both body pose and facial landmarks.
We provide our results on the publicly available in the wild EMOTIC dataset.
arXiv Detail & Related papers (2022-10-18T20:01:40Z) - Emotion Separation and Recognition from a Facial Expression by
Generating the Poker Face with Vision Transformers [57.67586172996843]
We propose a novel FER model, called Poker Face Vision Transformer or PF-ViT, to separate and recognize the disturbance-agnostic emotion from a static facial image.
PF-ViT generates its corresponding poker face without the need for paired images.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.
On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
arXiv Detail & Related papers (2022-04-24T15:58:35Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing.
We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.