TSAM: A Two-Stream Attention Model for Causal Emotion Entailment
- URL: http://arxiv.org/abs/2203.00819v1
- Date: Wed, 2 Mar 2022 02:11:41 GMT
- Title: TSAM: A Two-Stream Attention Model for Causal Emotion Entailment
- Authors: Duzhen Zhang, Zhen Yang, Fandong Meng, Xiuyi Chen, Jie Zhou
- Abstract summary: Causal Emotion Entailment (CEE) aims to discover the potential causes behind an emotion in a conversational utterance.
We classify multiple utterances synchronously to capture the correlations between utterances in a global view.
We propose a Two-Stream Attention Model (TSAM) to effectively model the speaker's emotional influences in the conversational history.
- Score: 50.07800752967995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal Emotion Entailment (CEE) aims to discover the potential causes behind
an emotion in a conversational utterance. Previous works formalize CEE as
independent utterance pair classification problems, with emotion and speaker
information neglected. From a new perspective, this paper considers CEE in a
joint framework. We classify multiple utterances synchronously to capture the
correlations between utterances in a global view and propose a Two-Stream
Attention Model (TSAM) to effectively model the speaker's emotional influences
in the conversational history. Specifically, the TSAM comprises three modules:
Emotion Attention Network (EAN), Speaker Attention Network (SAN), and
interaction module. The EAN and SAN incorporate emotion and speaker information
in parallel, and the subsequent interaction module effectively interchanges
relevant information between the EAN and SAN via a mutual BiAffine
transformation. Experimental results on a benchmark dataset demonstrate that
our model achieves new State-Of-The-Art (SOTA) performance and outperforms
baselines remarkably.
Related papers
- BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks [2.9873893715462176]
This work introduces a novel framework named BiosERC, which investigates speaker characteristics in a conversation.
By employing Large Language Models (LLMs), we extract the "biographical information" of the speaker within a conversation.
Our proposed method achieved state-of-the-art (SOTA) results on three famous benchmark datasets.
arXiv Detail & Related papers (2024-07-05T06:25:34Z) - MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis [70.06396781553191]
Multimodal Emotional Text-to-Speech System (MM-TTS) is a unified framework that leverages emotional cues from multiple modalities to generate highly expressive and emotionally resonant speech.
MM-TTS consists of two key components: the Emotion Prompt Alignment Module (EP-Align), which employs contrastive learning to align emotional features across text, audio, and visual modalities, and the Emotion Embedding-Induced TTS (EMI-TTS), which integrates the aligned emotional embeddings with state-of-the-art TTS models to synthesize speech that accurately reflects the intended emotions.
arXiv Detail & Related papers (2024-04-29T03:19:39Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion
Recognition in Conversation With Emotion Disentanglement [8.17164107060944]
Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field.
Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context.
We present a Hybrid Continuous Attributive Network (HCAN) to address these issues in the perspective of emotional continuation and emotional attribution.
arXiv Detail & Related papers (2023-09-18T14:18:16Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Modeling Intention, Emotion and External World in Dialogue Systems [14.724751780218297]
We propose a RelAtion Interaction Network (RAIN) to jointly model mutual relationships and explicitly integrate historical intention information.
The experiments on the dataset show that our model can take full advantage of the intention, emotion and action between individuals.
arXiv Detail & Related papers (2022-02-14T04:10:34Z) - DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act
Recognition and Sentiment Classification [77.59549450705384]
In dialog system, dialog act recognition and sentiment classification are two correlative tasks.
Most of the existing systems either treat them as separate tasks or just jointly model the two tasks.
We propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks.
arXiv Detail & Related papers (2020-08-16T14:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.