A Two-Stage Multimodal Emotion Recognition Model Based on Graph
Contrastive Learning
- URL: http://arxiv.org/abs/2401.01495v1
- Date: Wed, 3 Jan 2024 01:58:31 GMT
- Title: A Two-Stage Multimodal Emotion Recognition Model Based on Graph
Contrastive Learning
- Authors: Wei Ai, FuChen Zhang, Tao Meng, YunTao Shou, HongEn Shao, Keqin Li
- Abstract summary: We propose a two-stage emotion recognition model based on graph contrastive learning (TS-GCL)
We show that TS-GCL has superior performance on IEMOCAP and MELD datasets compared with previous methods.
- Score: 13.197551708300345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In terms of human-computer interaction, it is becoming more and more
important to correctly understand the user's emotional state in a conversation,
so the task of multimodal emotion recognition (MER) started to receive more
attention. However, existing emotion classification methods usually perform
classification only once. Sentences are likely to be misclassified in a single
round of classification. Previous work usually ignores the similarities and
differences between different morphological features in the fusion process. To
address the above issues, we propose a two-stage emotion recognition model
based on graph contrastive learning (TS-GCL). First, we encode the original
dataset with different preprocessing modalities. Second, a graph contrastive
learning (GCL) strategy is introduced for these three modal data with other
structures to learn similarities and differences within and between modalities.
Finally, we use MLP twice to achieve the final emotion classification. This
staged classification method can help the model to better focus on different
levels of emotional information, thereby improving the performance of the
model. Extensive experiments show that TS-GCL has superior performance on
IEMOCAP and MELD datasets compared with previous methods.
Related papers
- Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification [10.667645628712542]
This paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification.
experiments on four TCGA datasets demonstrate that our QPMIL-VL framework is effective for incremental WSI classification.
arXiv Detail & Related papers (2024-10-14T14:49:34Z) - Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion
Recognition [2.4660652494309936]
MultiDAG+CL is a novel approach for Multimodal Emotion Recognition in Conversation (ERC)
The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance.
Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models.
arXiv Detail & Related papers (2024-02-27T07:28:05Z) - Deep Imbalanced Learning for Multimodal Emotion Recognition in
Conversations [15.705757672984662]
Multimodal Emotion Recognition in Conversations (MERC) is a significant development direction for machine intelligence.
Many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition.
We propose the Class Boundary Enhanced Representation Learning (CBERL) model to address the imbalanced distribution of emotion categories in raw data.
We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition.
arXiv Detail & Related papers (2023-12-11T12:35:17Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion
Recognition? [36.67937514793215]
Cross-modal attention is seen as an effective mechanism for multi-modal fusion.
We implement and compare a cross-attention and a self-attention model.
We compare the models using different modality combinations for a 7-class emotion classification task.
arXiv Detail & Related papers (2022-02-18T15:44:14Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - PK-GCN: Prior Knowledge Assisted Image Classification using Graph
Convolution Networks [3.4129083593356433]
Similarity between classes can influence the performance of classification.
We propose a method that incorporates class similarity knowledge into convolutional neural networks models.
Experimental results show that our model can improve classification accuracy, especially when the amount of available data is small.
arXiv Detail & Related papers (2020-09-24T18:31:35Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z) - Learning Attentive Pairwise Interaction for Fine-Grained Classification [53.66543841939087]
We propose a simple but effective Attentive Pairwise Interaction Network (API-Net) for fine-grained classification.
API-Net first learns a mutual feature vector to capture semantic differences in the input pair.
It then compares this mutual vector with individual vectors to generate gates for each input image.
We conduct extensive experiments on five popular benchmarks in fine-grained classification.
arXiv Detail & Related papers (2020-02-24T12:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.