Discovering Emotion and Reasoning its Flip in Multi-Party Conversations
using Masked Memory Network and Transformer
- URL: http://arxiv.org/abs/2103.12360v2
- Date: Wed, 24 Mar 2021 09:17:35 GMT
- Title: Discovering Emotion and Reasoning its Flip in Multi-Party Conversations
using Masked Memory Network and Transformer
- Authors: Shivani Kumar, Anubhav Shrimal, Md Shad Akhtar, Tanmoy Chakraborty
- Abstract summary: We introduce a novel task -- Emotion Flip Reasoning (EFR)
EFR aims to identify past utterances which have triggered one's emotion state to flip at a certain time.
We propose a masked memory network to address the former and a Transformer-based network for the latter task.
- Score: 16.224961520924115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient discovery of emotion states of speakers in a multi-party
conversation is highly important to design human-like conversational agents.
During the conversation, the cognitive state of a speaker often alters due to
certain past utterances, which may lead to a flip in her emotion state.
Therefore, discovering the reasons (triggers) behind one's emotion flip during
conversation is important to explain the emotion labels of individual
utterances. In this paper, along with addressing the task of emotion
recognition in conversations (ERC), we introduce a novel task -- Emotion Flip
Reasoning (EFR) that aims to identify past utterances which have triggered
one's emotion state to flip at a certain time. We propose a masked memory
network to address the former and a Transformer-based network for the latter
task. To this end, we consider MELD, a benchmark emotion recognition dataset in
multi-party conversations for the task of ERC and augment it with new
ground-truth labels for EFR. An extensive comparison with four state-of-the-art
models suggests improved performances of our models for both the tasks. We
further present anecdotal evidences and both qualitative and quantitative error
analyses to support the superiority of our models compared to the baselines.
Related papers
- SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations [53.60993109543582]
SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, aims at extracting all pairs of emotions and their corresponding causes from conversations.
Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE)
In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
arXiv Detail & Related papers (2024-05-19T09:59:00Z) - ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains [61.50113532215864]
Causal Emotion Entailment (CEE) aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance.
Current works in CEE mainly focus on modeling semantic and emotional interactions in conversations.
We introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations.
arXiv Detail & Related papers (2024-05-17T15:45:08Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Emotion Flip Reasoning in Multiparty Conversations [27.884015521888458]
Instigator based Emotion Flip Reasoning (EFR) aims to identify the instigator behind a speaker's emotion flip within a conversation.
We present MELD-I, a dataset that includes ground-truth EFR instigator labels, which are in line with emotional psychology.
We propose a novel neural architecture called TGIF, which leverages Transformer encoders and stacked GRUs to capture the dialogue context.
arXiv Detail & Related papers (2023-06-24T13:22:02Z) - Mimicking the Thinking Process for Emotion Recognition in Conversation
with Prompts and Paraphrasing [26.043447749659478]
We propose a novel framework which mimics the thinking process when modeling complex factors.
We first comprehend the conversational context with a history-oriented prompt to selectively gather information from predecessors of the target utterance.
We then model the speaker's background with an experience-oriented prompt to retrieve the similar utterances from all conversations.
arXiv Detail & Related papers (2023-06-11T06:36:19Z) - Beyond Isolated Utterances: Conversational Emotion Recognition [33.52961239281893]
Speech emotion recognition is the task of recognizing the speaker's emotional state given a recording of their utterance.
We propose several approaches for conversational emotion recognition (CER) by treating it as a sequence labeling task.
We investigated transformer architecture for CER and, compared it with ResNet-34 and BiLSTM architectures in both contextual and context-less scenarios.
arXiv Detail & Related papers (2021-09-13T16:40:35Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - AdCOFE: Advanced Contextual Feature Extraction in Conversations for
emotion classification [0.29360071145551075]
The proposed model of Advanced Contextual Feature Extraction (AdCOFE) addresses these issues.
Experiments on the Emotion recognition in conversations dataset show that AdCOFE is beneficial in capturing emotions in conversations.
arXiv Detail & Related papers (2021-04-09T17:58:19Z) - Seen and Unseen emotional style transfer for voice conversion with a new
emotional speech dataset [84.53659233967225]
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
We propose a novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN)
We show that the proposed framework achieves remarkable performance by consistently outperforming the baseline framework.
arXiv Detail & Related papers (2020-10-28T07:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.