DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in
the Conversation
- URL: http://arxiv.org/abs/2010.07637v1
- Date: Thu, 15 Oct 2020 10:10:41 GMT
- Title: DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in
the Conversation
- Authors: Yuzhao Mao, Qi Sun, Guang Liu, Xiaojie Wang, Weiguo Gao, Xuan Li,
Jianping Shen
- Abstract summary: We propose the DialogueTransformer to explore the differentiated emotional behaviors from the intra- and inter-modal perspectives.
For intra-modal, we construct a novel Hierarchical Transformer that can easily switch between sequential and feed-forward structures.
For inter-modal, we constitute a novel Multi-Grained Interactive Fusion that applies both neuron- and vector-grained feature interactions.
- Score: 20.691806885663848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion Recognition in Conversations (ERC) is essential for building
empathetic human-machine systems. Existing studies on ERC primarily focus on
summarizing the context information in a conversation, however, ignoring the
differentiated emotional behaviors within and across different modalities.
Designing appropriate strategies that fit the differentiated multi-modal
emotional behaviors can produce more accurate emotional predictions. Thus, we
propose the DialogueTransformer to explore the differentiated emotional
behaviors from the intra- and inter-modal perspectives. For intra-modal, we
construct a novel Hierarchical Transformer that can easily switch between
sequential and feed-forward structures according to the differentiated context
preference within each modality. For inter-modal, we constitute a novel
Multi-Grained Interactive Fusion that applies both neuron- and vector-grained
feature interactions to learn the differentiated contributions across all
modalities. Experimental results show that DialogueTRM outperforms the
state-of-the-art by a significant margin on three benchmark datasets.
Related papers
- Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning [40.101313334772016]
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information.
Previous ERC methods relied on simple connections for cross-modal fusion.
We propose a cross-modal fusion emotion prediction network based on vector connections.
arXiv Detail & Related papers (2024-05-28T07:22:30Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features.
Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z) - AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in
Group Conversations [39.79734528362605]
Multimodal Attention Network captures cross-modal interactions at various levels of spatial abstraction.
AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level.
arXiv Detail & Related papers (2024-01-26T19:17:05Z) - Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition [14.639340916340801]
We propose a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method.
Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces.
Secondly, we build a generator and a discriminator for the three modal features through adversarial representation.
Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information.
arXiv Detail & Related papers (2023-12-28T01:57:26Z) - A Transformer-Based Model With Self-Distillation for Multimodal Emotion
Recognition in Conversations [15.77747948751497]
We propose a transformer-based model with self-distillation (SDT) for the task.
The proposed model captures intra- and inter-modal interactions by utilizing intra- and inter-modal transformers.
We introduce self-distillation to transfer knowledge of hard and soft labels from the proposed model to each modality.
arXiv Detail & Related papers (2023-10-31T14:33:30Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Re-mine, Learn and Reason: Exploring the Cross-modal Semantic
Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task.
We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z) - Expanding the Role of Affective Phenomena in Multimodal Interaction
Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing.
We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers.
We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.