Related papers: Dynamic Graph Neural ODE Network for Multi-modal Emotion Recognition in Conversation

Dynamic Graph Neural ODE Network for Multi-modal Emotion Recognition in Conversation

URL: http://arxiv.org/abs/2412.02935v2
Date: Mon, 27 Jan 2025 02:01:59 GMT
Title: Dynamic Graph Neural ODE Network for Multi-modal Emotion Recognition in Conversation
Authors: Yuntao Shou, Tao Meng, Wei Ai, Keqin Li,
Abstract summary: We propose a Dynamic Graph Neural Ordinary Differential Equation Network (DGODE) for Multimodal emotion recognition in conversation (MERC)<n>The proposed DGODE combines the dynamic changes of emotions to capture the temporal dependency of speakers' emotions.<n>Experiments on two publicly available multimodal emotion recognition datasets demonstrate that the proposed DGODE model has superior performance compared to various baselines.
Score: 14.158939954453933
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal emotion recognition in conversation (MERC) refers to identifying and classifying human emotional states by combining data from multiple different modalities (e.g., audio, images, text, video, etc.). Most existing multimodal emotion recognition methods use GCN to improve performance, but existing GCN methods are prone to overfitting and cannot capture the temporal dependency of the speaker's emotions. To address the above problems, we propose a Dynamic Graph Neural Ordinary Differential Equation Network (DGODE) for MERC, which combines the dynamic changes of emotions to capture the temporal dependency of speakers' emotions, and effectively alleviates the overfitting problem of GCNs. Technically, the key idea of DGODE is to utilize an adaptive mixhop mechanism to improve the generalization ability of GCNs and use the graph ODE evolution network to characterize the continuous dynamics of node representations over time and capture temporal dependencies. Extensive experiments on two publicly available multimodal emotion recognition datasets demonstrate that the proposed DGODE model has superior performance compared to various baselines. Furthermore, the proposed DGODE can also alleviate the over-smoothing problem, thereby enabling the construction of a deep GCN network.

Related papers

Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training [63.3991315762955]
Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation.<n>Most existing SNNs assume a single time constant for neuronal membrane voltage dynamics, modeled by first-order ordinary differential equations (ODEs) with Markovian characteristics.<n>We propose the Fractional SPIKE Differential Equation neural network (fspikeDE), which captures long-term dependencies in membrane voltage and spike trains through fractional-order dynamics.
arXiv Detail & Related papers (2025-07-22T18:20:56Z)
FreqDGT: Frequency-Adaptive Dynamic Graph Networks with Transformer for Cross-subject EEG Emotion Recognition [1.9198890060313585]
Cross-subject generalization is a challenge due to individual variability, cognitive traits, and emotional responses.<n>We propose FreqDGT, a frequency-adaptive dynamic graph transformer that addresses these limitations through an integrated framework.<n>FreqDGT significantly improves cross-subject emotion recognition accuracy, confirming the effectiveness of integrating frequency-adaptive, spatial-dynamic, and temporal-hierarchical modeling.
arXiv Detail & Related papers (2025-06-28T08:18:05Z)
Effective Context Modeling Framework for Emotion Recognition in Conversations [2.7175580940471913]
Emotion Recognition in Conversations (ERC) facilitates a deeper understanding of the emotions conveyed by speakers in each utterance within a conversation. Recent Graph Neural Networks (GNNs) have demonstrated their strengths in capturing data relationships. We propose ConxGNN, a novel GNN-based framework designed to capture contextual information in conversations.
arXiv Detail & Related papers (2024-12-21T02:22:06Z)
SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition [14.645598552036908]
Multimodal Emotion Recognition in Conversations (MERC) aims to classify utterance emotions using textual, auditory, and visual modal features.<n>Most existing MERC methods assume each utterance has complete modalities, overlooking the common issue of incomplete modalities in real-world scenarios.<n>We propose a Spectral Domain Reconstruction Graph Neural Network (SDR-GNN) for incomplete multimodal learning in conversational emotion recognition.
arXiv Detail & Related papers (2024-11-29T16:31:50Z)
Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition [37.12407597998884]
A novel approach named GraphSmile is proposed for tracking intricate emotional cues in multimodal dialogues. GraphSmile comprises two key components, i.e., GSF and SDP modules. Empirical results on multiple benchmarks demonstrate that GraphSmile can handle complex emotional and sentimental patterns.
arXiv Detail & Related papers (2024-07-31T11:47:36Z)
Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques [0.0]
This research proposes an ensemble model that combines Graph Convolutional Networks (GCN) for processing textual data and the HuBERT transformer for analyzing audio signals. By combining GCN and HuBERT, our ensemble model can leverage the strengths of both approaches. Results indicate that the combined model can overcome the limitations of traditional methods, leading to enhanced accuracy in recognizing emotions from speech.
arXiv Detail & Related papers (2023-08-04T06:20:42Z)
EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK) EMERSK is a general system for human emotion recognition and explanation using visual information. Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z)
MSA-GCN:Multiscale Adaptive Graph Convolution Network for Gait Emotion Recognition [6.108523790270448]
We present a novel Multi Scale Adaptive Graph Convolution Network (MSA-GCN) to recognize emotions. In our model, a adaptive selective spatial-temporal convolution is designed to select the convolution kernel dynamically to obtain the soft-temporal features of different emotions. Compared with previous state-of-the-art methods, the proposed method achieves the best performance on two public datasets.
arXiv Detail & Related papers (2022-09-19T13:07:16Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z)
Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification. We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions. We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z)
Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition [53.17837649440601]
We propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image. Experiments on public multi-label benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-12-05T10:10:12Z)
Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild. We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)
Video-based Facial Expression Recognition using Graph Convolutional Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition. We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z)
Time Dependence in Non-Autonomous Neural ODEs [74.78386661760662]
We propose a novel family of Neural ODEs with time-varying weights. We outperform previous Neural ODE variants in both speed and representational capacity.
arXiv Detail & Related papers (2020-05-05T01:41:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.