EmoTalker: Emotionally Editable Talking Face Generation via Diffusion
Model
- URL: http://arxiv.org/abs/2401.08049v1
- Date: Tue, 16 Jan 2024 02:02:44 GMT
- Title: EmoTalker: Emotionally Editable Talking Face Generation via Diffusion
Model
- Authors: Bingyuan Zhang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong
Wang
- Abstract summary: EmoTalker is an emotionally editable portraits animation approach based on the diffusion model.
Emotion Intensity Block is introduced to analyze fine-grained emotions and strengths derived from prompts.
Experiments show the effectiveness of EmoTalker in generating high-quality, emotionally customizable facial expressions.
- Score: 39.14430238946951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the field of talking faces generation has attracted
considerable attention, with certain methods adept at generating virtual faces
that convincingly imitate human expressions. However, existing methods face
challenges related to limited generalization, particularly when dealing with
challenging identities. Furthermore, methods for editing expressions are often
confined to a singular emotion, failing to adapt to intricate emotions. To
overcome these challenges, this paper proposes EmoTalker, an emotionally
editable portraits animation approach based on the diffusion model. EmoTalker
modifies the denoising process to ensure preservation of the original
portrait's identity during inference. To enhance emotion comprehension from
text input, Emotion Intensity Block is introduced to analyze fine-grained
emotions and strengths derived from prompts. Additionally, a crafted dataset is
harnessed to enhance emotion comprehension within prompts. Experiments show the
effectiveness of EmoTalker in generating high-quality, emotionally customizable
facial expressions.
Related papers
- EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [5.954758598327494]
EMOdiffhead is a novel method for emotional talking head video generation.
It enables fine-grained control of emotion categories and intensities.
It achieves state-of-the-art performance compared to other emotion portrait animation methods.
arXiv Detail & Related papers (2024-09-11T13:23:22Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech [34.03787613163788]
EmoSphere-TTS synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech.
We propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics.
arXiv Detail & Related papers (2024-06-12T01:40:29Z) - EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face
Generation [34.5592743467339]
We propose a visual attribute-guided audio decoupler to generate fine-grained facial animations.
To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module.
Our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization.
arXiv Detail & Related papers (2024-02-02T14:04:18Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - High-fidelity Generalized Emotional Talking Face Generation with
Multi-modal Emotion Space Learning [43.09015109281053]
We propose a more flexible and generalized framework for talking face generation.
Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space.
An Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation.
arXiv Detail & Related papers (2023-05-04T05:59:34Z) - Expressive Speech-driven Facial Animation with controllable emotions [12.201573788014622]
This paper presents a novel deep learning-based approach for expressive facial animation generation from speech.
It can exhibit wide-spectrum facial expressions with controllable emotion type and intensity.
It enables emotion-controllable facial animation, where the target expression can be continuously adjusted.
arXiv Detail & Related papers (2023-01-05T11:17:19Z) - Emotion-aware Chat Machine: Automatic Emotional Response Generation for
Human-like Emotional Interaction [55.47134146639492]
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post.
Experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
arXiv Detail & Related papers (2021-06-06T06:26:15Z) - Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing.
We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.