Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
- URL: http://arxiv.org/abs/2302.09856v2
- Date: Tue, 21 Feb 2023 08:53:48 GMT
- Title: Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
- Authors: Zihan Zhao, Yu Wang, Yanfeng Wang
- Abstract summary: Experimental results on the IEMOCAP dataset show that the proposed approach can outperform several state-of-the-art approaches by at least 0.7% unweighted accuracy (UA)
- Score: 15.133202035812017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal emotion recognition is a challenging research area that aims to
fuse different modalities to predict human emotion. However, most existing
models that are based on attention mechanisms have difficulty in learning
emotionally relevant parts on their own. To solve this problem, we propose to
incorporate external emotion-related knowledge in the co-attention based fusion
of pre-trained models. To effectively incorporate this knowledge, we enhance
the co-attention model with a Bayesian attention module (BAM) where a prior
distribution is estimated using the emotion-related knowledge. Experimental
results on the IEMOCAP dataset show that the proposed approach can outperform
several state-of-the-art approaches by at least 0.7% unweighted accuracy (UA).
Related papers
- EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition [23.505616142198487]
We develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition.
The model learns universal latent representations of EEG signals through pre-training on large scale dataset.
Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks.
arXiv Detail & Related papers (2024-05-28T14:31:11Z) - Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences [4.740624855896404]
We propose a contrastive learning framework utilizing selective strong augmentation for self-supervised gait-based emotion representation.
Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
arXiv Detail & Related papers (2024-05-08T09:13:10Z) - Deep Imbalanced Learning for Multimodal Emotion Recognition in
Conversations [15.705757672984662]
Multimodal Emotion Recognition in Conversations (MERC) is a significant development direction for machine intelligence.
Many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition.
We propose the Class Boundary Enhanced Representation Learning (CBERL) model to address the imbalanced distribution of emotion categories in raw data.
We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition.
arXiv Detail & Related papers (2023-12-11T12:35:17Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition [13.994609732846344]
Most effective techniques for emotion recognition efficiently leverage diverse and complimentary sources of information.
We introduce a cross-attentional fusion approach to extract the salient features across audio-visual (A-V) modalities.
Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches.
arXiv Detail & Related papers (2021-11-09T16:01:56Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Attentive Cross-modal Connections for Deep Multimodal Wearable-based
Emotion Recognition [7.559720049837459]
We present a novel attentive cross-modal connection to share information between convolutional neural networks.
Specifically, these connections improve emotion classification by sharing intermediate representations among EDA and ECG.
Our experiments show that the proposed approach is capable of learning strong multimodal representations and outperforms a number of baselines methods.
arXiv Detail & Related papers (2021-08-04T18:40:32Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.