Related papers: EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

URL: http://arxiv.org/abs/2401.08049v1
Date: Tue, 16 Jan 2024 02:02:44 GMT
Title: EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model
Authors: Bingyuan Zhang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong Wang
Abstract summary: EmoTalker is an emotionally editable portraits animation approach based on the diffusion model. Emotion Intensity Block is introduced to analyze fine-grained emotions and strengths derived from prompts. Experiments show the effectiveness of EmoTalker in generating high-quality, emotionally customizable facial expressions.
Score: 39.14430238946951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions. However, existing methods face challenges related to limited generalization, particularly when dealing with challenging identities. Furthermore, methods for editing expressions are often confined to a singular emotion, failing to adapt to intricate emotions. To overcome these challenges, this paper proposes EmoTalker, an emotionally editable portraits animation approach based on the diffusion model. EmoTalker modifies the denoising process to ensure preservation of the original portrait's identity during inference. To enhance emotion comprehension from text input, Emotion Intensity Block is introduced to analyze fine-grained emotions and strengths derived from prompts. Additionally, a crafted dataset is harnessed to enhance emotion comprehension within prompts. Experiments show the effectiveness of EmoTalker in generating high-quality, emotionally customizable facial expressions.

Related papers

Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics. We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention. Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks. Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
EmoSEM: Segment and Explain Emotion Stimuli in Visual Art [25.539022846134543]
This paper focuses on a key challenge in visual art understanding: given an art image, the model pinpoints pixel regions that trigger a specific human emotion. Despite recent advances in art understanding, pixel-level emotion understanding still faces a dual challenge. This paper proposes the Emotion stimuli and Explanation Model (EmoSEM) to endow the segmentation model SAM with emotion comprehension capability.
arXiv Detail & Related papers (2025-04-20T15:40:00Z)
EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models [66.67979602235015]
EmoDiffusion is a novel approach that disentangles different emotions in speech to generate rich 3D emotional facial expressions. We capture facial expressions under the guidance of animation experts using LiveLinkFace on an iPhone.
arXiv Detail & Related papers (2025-03-14T02:54:22Z)
When Words Smile: Generating Diverse Emotional Facial Expressions from Text [72.19705878257204]
We introduce an end-to-end text-to-expression model that explicitly focuses on emotional dynamics.<n>Our model learns expressive facial variations in a continuous latent space and generates expressions that are diverse, fluid, and emotionally coherent.
arXiv Detail & Related papers (2024-12-03T15:39:05Z)
EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [5.954758598327494]
EMOdiffhead is a novel method for emotional talking head video generation. It enables fine-grained control of emotion categories and intensities. It achieves state-of-the-art performance compared to other emotion portrait animation methods.
arXiv Detail & Related papers (2024-09-11T13:23:22Z)
Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity. We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z)
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech [34.03787613163788]
EmoSphere-TTS synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. We propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics.
arXiv Detail & Related papers (2024-06-12T01:40:29Z)
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation [34.5592743467339]
We propose a visual attribute-guided audio decoupler to generate fine-grained facial animations. To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module. Our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization.
arXiv Detail & Related papers (2024-02-02T14:04:18Z)
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity. Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z)
Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion. EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z)
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [43.09015109281053]
We propose a more flexible and generalized framework for talking face generation. Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space. An Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation.
arXiv Detail & Related papers (2023-05-04T05:59:34Z)
Expressive Speech-driven Facial Animation with controllable emotions [12.201573788014622]
This paper presents a novel deep learning-based approach for expressive facial animation generation from speech. It can exhibit wide-spectrum facial expressions with controllable emotion type and intensity. It enables emotion-controllable facial animation, where the target expression can be continuously adjusted.
arXiv Detail & Related papers (2023-01-05T11:17:19Z)
Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction [55.47134146639492]
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post. Experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
arXiv Detail & Related papers (2021-06-06T06:26:15Z)
Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing. We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.