Generative Emotion Cause Explanation in Multimodal Conversations
- URL: http://arxiv.org/abs/2411.02430v1
- Date: Fri, 01 Nov 2024 09:16:30 GMT
- Title: Generative Emotion Cause Explanation in Multimodal Conversations
- Authors: Lin Wang, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang,
- Abstract summary: We propose a new task, textbfMultimodal textbfConversation textbfEmotion textbfCause textbfExplanation (MCECE)
It aims to generate a detailed explanation of the emotional cause to the target utterance within a multimodal conversation scenario.
A novel approach, FAME-Net, is proposed, that harnesses the power of Large Language Models (LLMs) to analyze visual data and accurately interpret the emotions conveyed through facial expressions in videos.
- Score: 23.39751445330256
- License:
- Abstract: Multimodal conversation, a crucial form of human communication, carries rich emotional content, making the exploration of the causes of emotions within it a research endeavor of significant importance. However, existing research on the causes of emotions typically uses clause selection methods to locate the reason utterance, without providing a detailed explanation of the emotional causes. In this paper, we propose a new task, \textbf{M}ultimodal \textbf{C}onversation \textbf{E}motion \textbf{C}ause \textbf{E}xplanation (MCECE), aiming to generate a detailed explanation of the emotional cause to the target utterance within a multimodal conversation scenario. Building upon the MELD dataset, we develop a new dataset (ECEM) that integrates video clips with detailed explanations of character emotions, facilitating an in-depth examination of the causal factors behind emotional expressions in multimodal conversations.A novel approach, FAME-Net, is further proposed, that harnesses the power of Large Language Models (LLMs) to analyze visual data and accurately interpret the emotions conveyed through facial expressions in videos. By exploiting the contagion effect of facial emotions, FAME-Net effectively captures the emotional causes of individuals engaged in conversations. Our experimental results on the newly constructed dataset show that FAME-Net significantly outperforms several excellent large language model baselines. Code and dataset are available at \url{https://github.com/3222345200/ECEMdataset.git}
Related papers
- EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains [61.50113532215864]
Causal Emotion Entailment (CEE) aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance.
Current works in CEE mainly focus on modeling semantic and emotional interactions in conversations.
We introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations.
arXiv Detail & Related papers (2024-05-17T15:45:08Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Multi-Task Learning Framework for Extracting Emotion Cause Span and
Entailment in Conversations [3.2260643152341095]
We propose neural models to extract emotion cause span and entailment in conversations.
MuTEC is an end-to-end Multi-Task learning framework for extracting emotions, emotion cause, and entailment in conversations.
arXiv Detail & Related papers (2022-11-07T18:14:45Z) - A Multi-turn Machine Reading Comprehension Framework with Rethink
Mechanism for Emotion-Cause Pair Extraction [6.6564045064972825]
Emotion-cause pair extraction (ECPE) is an emerging task in emotion cause analysis.
We propose a Multi-turn MRC framework with Rethink mechanism (MM-R) to tackle the ECPE task.
Our framework can model complicated relations between emotions and causes while avoiding generating the pairing matrix.
arXiv Detail & Related papers (2022-09-16T14:38:58Z) - M2FNet: Multi-modal Fusion Network for Emotion Recognition in
Conversation [1.3864478040954673]
We propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality.
It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data.
The proposed feature extractor is trained with a novel adaptive margin-based triplet loss function to learn emotion-relevant features from the audio and visual data.
arXiv Detail & Related papers (2022-06-05T14:18:58Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network
for Emotional Conversation Generation [25.808037796936766]
In a real-world conversation, we instinctively perceive emotions from multi-source information.
We propose a heterogeneous graph-based model for emotional conversation generation.
Experimental results show that our model can effectively perceive emotions from multi-source knowledge.
arXiv Detail & Related papers (2020-12-09T06:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.