Related papers: Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

URL: http://arxiv.org/abs/2507.15205v2
Date: Thu, 24 Jul 2025 05:15:18 GMT
Title: Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
Authors: Xinran Li, Xiujuan Xu, Jiaqi Qiao,
Abstract summary: This paper proposes a novel multimodal approach to Emotion Recognition in Conversation (ERC)<n>It constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances.<n> Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.
Score: 3.031375888004876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a "weighted emotional shift" metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.

Related papers

GeoMM: On Geodesic Perspective for Multi-modal Learning [55.41612200877861]
This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time.<n>Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning.
arXiv Detail & Related papers (2025-05-16T13:12:41Z)
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.<n>We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.<n>We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
Effective Context Modeling Framework for Emotion Recognition in Conversations [2.7175580940471913]
Emotion Recognition in Conversations (ERC) facilitates a deeper understanding of the emotions conveyed by speakers in each utterance within a conversation.<n>Recent Graph Neural Networks (GNNs) have demonstrated their strengths in capturing data relationships.<n>We propose ConxGNN, a novel GNN-based framework designed to capture contextual information in conversations.
arXiv Detail & Related papers (2024-12-21T02:22:06Z)
Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations [8.107561045241445]
We propose an Efficient Long-distance Latent Relation-aware Graph Neural Network (ELR-GNN) for multi-modal emotion recognition in conversations. ELR-GNN achieves state-of-the-art performance on the benchmark IEMOCAP and MELD, with running times reduced by 52% and 35%, respectively.
arXiv Detail & Related papers (2024-06-27T15:54:12Z)
Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition [2.4660652494309936]
MultiDAG+CL is a novel approach for Multimodal Emotion Recognition in Conversation (ERC) The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models.
arXiv Detail & Related papers (2024-02-27T07:28:05Z)
DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition [14.639340916340801]
We propose a novel Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition (DER-GCN) method. It models dialogue relations between speakers and captures latent event relations information. We conduct extensive experiments on the IEMOCAP and MELD benchmark datasets, which verify the effectiveness of the DER-GCN model.
arXiv Detail & Related papers (2023-12-17T01:49:40Z)
Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques [0.0]
This research proposes an ensemble model that combines Graph Convolutional Networks (GCN) for processing textual data and the HuBERT transformer for analyzing audio signals. By combining GCN and HuBERT, our ensemble model can leverage the strengths of both approaches. Results indicate that the combined model can overcome the limitations of traditional methods, leading to enhanced accuracy in recognizing emotions from speech.
arXiv Detail & Related papers (2023-08-04T06:20:42Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z)
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition. Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction. Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
Dynamic Graph Modeling of Simultaneous EEG and Eye-tracking Data for Reading Task Identification [79.41619843969347]
We present a new approach, that we call AdaGTCN, for identifying human reader intent from Electroencephalogram(EEG) and Eye movement(EM) data. Our method, Adaptive Graph Temporal Convolution Network (AdaGTCN), uses an Adaptive Graph Learning Layer and Deep Neighborhood Graph Convolution Layer. We compare our approach with several baselines to report an improvement of 6.29% on the ZuCo 2.0 dataset, along with extensive ablation experiments.
arXiv Detail & Related papers (2021-02-21T18:19:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.