EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware
Motion Model
- URL: http://arxiv.org/abs/2205.15278v2
- Date: Tue, 31 May 2022 10:15:29 GMT
- Title: EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware
Motion Model
- Authors: Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun
Cao
- Abstract summary: We propose the Emotion-Aware Motion Model (EAMM) to generate one-shot emotional talking faces.
By incorporating the results from both modules, our method can generate satisfactory talking face results on arbitrary subjects.
- Score: 32.19539143308341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although significant progress has been made to audio-driven talking face
generation, existing methods either neglect facial emotion or cannot be applied
to arbitrary subjects. In this paper, we propose the Emotion-Aware Motion Model
(EAMM) to generate one-shot emotional talking faces by involving an emotion
source video. Specifically, we first propose an Audio2Facial-Dynamics module,
which renders talking faces from audio-driven unsupervised zero- and
first-order key-points motion. Then through exploring the motion model's
properties, we further propose an Implicit Emotion Displacement Learner to
represent emotion-related facial dynamics as linearly additive displacements to
the previously acquired motion representations. Comprehensive experiments
demonstrate that by incorporating the results from both modules, our method can
generate satisfactory talking face results on arbitrary subjects with realistic
emotion patterns.
Related papers
- Audio-Driven Emotional 3D Talking-Head Generation [47.6666060652434]
We present a novel system for synthesizing high-fidelity, audio-driven video portraits with accurate emotional expressions.
We propose a pose sampling method that generates natural idle-state (non-speaking) videos in response to silent audio inputs.
arXiv Detail & Related papers (2024-10-07T08:23:05Z) - EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [5.954758598327494]
EMOdiffhead is a novel method for emotional talking head video generation.
It enables fine-grained control of emotion categories and intensities.
It achieves state-of-the-art performance compared to other emotion portrait animation methods.
arXiv Detail & Related papers (2024-09-11T13:23:22Z) - DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation [14.07086606183356]
Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications.
Current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion.
We introduce DEEPTalk, a novel approach that generates diverse and emotionally rich 3D facial expressions directly from speech inputs.
arXiv Detail & Related papers (2024-08-12T08:56:49Z) - EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face
Generation [34.5592743467339]
We propose a visual attribute-guided audio decoupler to generate fine-grained facial animations.
To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module.
Our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization.
arXiv Detail & Related papers (2024-02-02T14:04:18Z) - Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Audio-Driven Emotional Video Portraits [79.95687903497354]
We present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios.
Specifically, we propose the Cross-Reconstructed Emotion Disentanglement technique to decompose speech into two decoupled spaces.
With the disentangled features, dynamic 2D emotional facial landmarks can be deduced.
Then we propose the Target-Adaptive Face Synthesis technique to generate the final high-quality video portraits.
arXiv Detail & Related papers (2021-04-15T13:37:13Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - Speech Driven Talking Face Generation from a Single Image and an Emotion
Condition [28.52180268019401]
We propose a novel approach to rendering visual emotion expression in speech-driven talking face generation.
We design an end-to-end talking face generation system that takes a speech utterance, a single face image, and a categorical emotion label as input.
Objective evaluation on image quality, audiovisual synchronization, and visual emotion expression shows that the proposed system outperforms a state-of-the-art baseline system.
arXiv Detail & Related papers (2020-08-08T20:46:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.