Related papers: DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition

DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition

URL: http://arxiv.org/abs/2312.10579v2
Date: Sat, 31 Aug 2024 12:41:30 GMT
Title: DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition
Authors: Wei Ai, Yuntao Shou, Tao Meng, Nan Yin, Keqin Li,
Abstract summary: We propose a novel Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition (DER-GCN) method. It models dialogue relations between speakers and captures latent event relations information. We conduct extensive experiments on the IEMOCAP and MELD benchmark datasets, which verify the effectiveness of the DER-GCN model.
Score: 14.639340916340801
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the continuous development of deep learning (DL), the task of multimodal dialogue emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, in different dialogue scenes. However, existing research has focused on modeling contextual semantic information and dialogue relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition (DER-GCN) method. It models dialogue relations between speakers and captures latent event relations information. Specifically, we construct a weighted multi-relationship graph to simultaneously capture the dependencies between speakers and event relations in a dialogue. Moreover, we also introduce a Self-Supervised Masked Graph Autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new Multiple Information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the IEMOCAP and MELD benchmark datasets, which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the f1 value of emotion recognition.

Related papers

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning [60.44337740854767]
Multimodal representation learning plays an important role in the artificial intelligence domain. We introduce a new approach that models video-text as game players using multivariate cooperative game theory. We extend our original structure into a flexible encoder-decoder framework, enabling the model to adapt to various downstream tasks.
arXiv Detail & Related papers (2024-12-30T14:09:15Z)
Effective Context Modeling Framework for Emotion Recognition in Conversations [2.7175580940471913]
Emotion Recognition in Conversations (ERC) facilitates a deeper understanding of the emotions conveyed by speakers in each utterance within a conversation. Recent Graph Neural Networks (GNNs) have demonstrated their strengths in capturing data relationships. We propose ConxGNN, a novel GNN-based framework designed to capture contextual information in conversations.
arXiv Detail & Related papers (2024-12-21T02:22:06Z)
Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations [8.107561045241445]
We propose an Efficient Long-distance Latent Relation-aware Graph Neural Network (ELR-GNN) for multi-modal emotion recognition in conversations. ELR-GNN achieves state-of-the-art performance on the benchmark IEMOCAP and MELD, with running times reduced by 52% and 35%, respectively.
arXiv Detail & Related papers (2024-06-27T15:54:12Z)
ESIHGNN: Event-State Interactions Infused Heterogeneous Graph Neural Network for Conversational Emotion Recognition [16.800240197327923]
Existing graph-based methods mainly focus on event interactions to comprehend the conversational context. We propose a novel graph-based approach, namely Event-State Interactions infused Heterogeneous Graph Neural Network (ESIHGNN) ESIHGNN incorporates the speaker's emotional state and constructs a heterogeneous event-state interaction graph to model the conversation.
arXiv Detail & Related papers (2024-05-07T02:46:11Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features. Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z)
Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition [14.639340916340801]
We propose a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method. Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces. Secondly, we build a generator and a discriminator for the three modal features through adversarial representation. Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information.
arXiv Detail & Related papers (2023-12-28T01:57:26Z)
Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction [2.1261712640167856]
Emotion recognition is a crucial task for human conversation understanding. We propose an input Temporal Graph Neural Network with Cross-Modality Interaction (CORECT) CORECT effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies.
arXiv Detail & Related papers (2023-11-08T07:46:25Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs. We employ domain-adaptive training strategies to help the model adapt to the dialogue domains. Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z)
Modeling Coreference Relations in Visual Dialog [18.926582410644375]
The occurrences of coreference relations in the dialog makes it a more challenging task than visual question-answering. We propose two soft constraints that can improve the model's ability of resolving coreferences in dialog in an unsupervised way.
arXiv Detail & Related papers (2022-03-06T15:22:24Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z)
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems [133.13117064357425]
We propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. Experimental results show that our GRADE significantly outperforms other state-of-the-art metrics on measuring diverse dialogue models.
arXiv Detail & Related papers (2020-10-08T14:07:32Z)
ORD: Object Relationship Discovery for Visual Dialogue Generation [60.471670447176656]
We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation. A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally. Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
arXiv Detail & Related papers (2020-06-15T12:25:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.