Related papers: An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment

An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment

URL: http://arxiv.org/abs/2503.16454v1
Date: Fri, 21 Feb 2025 14:26:58 GMT
Title: An Audio-Visual Fusion Emotion Generation Model Based on Neuroanatomical Alignment
Authors: Haidong Wang, Qia Shan, JianHua Zhang, PengFei Xiao, Ao Liu,
Abstract summary: We introduce a novel framework named Audio-Visual Fusion for Brain-like Emotion Learning(AVF-BEL)<n>In contrast to conventional brain-inspired emotion learning methods, this approach improves the audio-visual emotion fusion and generation model.<n>The experimental results indicate a significant improvement in the similarity of the audio-visual fusion emotion learning generation model.
Score: 15.98131469205444
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of affective computing, traditional methods for generating emotions predominantly rely on deep learning techniques and large-scale emotion datasets. However, deep learning techniques are often complex and difficult to interpret, and standardizing large-scale emotional datasets are difficult and costly to establish. To tackle these challenges, we introduce a novel framework named Audio-Visual Fusion for Brain-like Emotion Learning(AVF-BEL). In contrast to conventional brain-inspired emotion learning methods, this approach improves the audio-visual emotion fusion and generation model through the integration of modular components, thereby enabling more lightweight and interpretable emotion learning and generation processes. The framework simulates the integration of the visual, auditory, and emotional pathways of the brain, optimizes the fusion of emotional features across visual and auditory modalities, and improves upon the traditional Brain Emotional Learning (BEL) model. The experimental results indicate a significant improvement in the similarity of the audio-visual fusion emotion learning generation model compared to single-modality visual and auditory emotion learning and generation model. Ultimately, this aligns with the fundamental phenomenon of heightened emotion generation facilitated by the integrated impact of visual and auditory stimuli. This contribution not only enhances the interpretability and efficiency of affective intelligence but also provides new insights and pathways for advancing affective computing technology. Our source code can be accessed here: https://github.com/OpenHUTB/emotion}{https://github.com/OpenHUTB/emotion.

Related papers

EmoCAST: Emotional Talking Portrait via Emotive Text Description [56.42674612728354]
EmoCAST is a diffusion-based framework for precise text-driven emotional synthesis.<n>In appearance modeling, emotional prompts are integrated through a text-guided decoupled emotive module.<n>EmoCAST achieves state-of-the-art performance in generating realistic, emotionally expressive, and audio-synchronized talking-head videos.
arXiv Detail & Related papers (2025-08-28T10:02:06Z)
UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries [61.5273479616832]
We propose a unified framework that seamlessly integrates emotional understanding and generation.<n>We show that UniEmo significantly outperforms state-of-the-art methods in both emotional understanding and generation tasks.
arXiv Detail & Related papers (2025-07-31T09:39:27Z)
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics. We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention. Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks. Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation [39.30784838378127]
The generation of talking avatars has achieved significant advancements in precise audio synchronization.<n>Current methods face fundamental challenges, including the lack of frameworks for modeling single basic emotional expressions.<n>We propose the Mixture of Emotion Experts (MoEE) model, which decouples six fundamental emotions to enable the precise synthesis of both singular and compound emotional states.<n>In conjunction with the DH-FaceEmoVid-150 dataset, we demonstrate that the MoEE framework excels in generating complex emotional expressions and nuanced facial details.
arXiv Detail & Related papers (2025-01-03T13:43:21Z)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.<n>Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.<n>We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z)
Emotion Detection through Body Gesture and Face [0.0]
The project addresses the challenge of emotion recognition by focusing on non-facial cues, specifically hands, body gestures, and gestures. Traditional emotion recognition systems mainly rely on facial expression analysis and often ignore the rich emotional information conveyed through body language. The project aims to contribute to the field of affective computing by enhancing the ability of machines to interpret and respond to human emotions in a more comprehensive and nuanced way.
arXiv Detail & Related papers (2024-07-13T15:15:50Z)
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity. Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z)
Learning Emotion Representations from Verbal and Nonverbal Communication [7.747924294389427]
We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
arXiv Detail & Related papers (2023-05-22T21:36:55Z)
SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z)
Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
We propose a stimuli-aware visual emotion analysis (VEA) method consisting of three stages, namely stimuli selection, feature extraction and emotion prediction. To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network. Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets.
arXiv Detail & Related papers (2021-09-04T08:14:52Z)
Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
Leveraging Recent Advances in Deep Learning for Audio-Visual Emotion Recognition [2.1485350418225244]
Spontaneous multi-modal emotion recognition has been extensively studied for human behavior analysis. We propose a new deep learning-based approach for audio-visual emotion recognition.
arXiv Detail & Related papers (2021-03-16T15:49:15Z)
Knowledge Bridging for Empathetic Dialogue Generation [52.39868458154947]
Lack of external knowledge makes empathetic dialogue systems difficult to perceive implicit emotions and learn emotional interactions from limited dialogue history. We propose to leverage external knowledge, including commonsense knowledge and emotional lexical knowledge, to explicitly understand and express emotions in empathetic dialogue generation.
arXiv Detail & Related papers (2020-09-21T09:21:52Z)
Meta Transfer Learning for Emotion Recognition [42.61707533351803]
We propose a PathNet-based transfer learning method that is able to transfer emotional knowledge learned from one visual/audio emotion domain to another visual/audio emotion domain. Our proposed system is capable of improving the performance of emotion recognition, making its performance substantially superior to the recent proposed fine-tuning/pre-trained models based transfer learning methods.
arXiv Detail & Related papers (2020-06-23T00:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.