Modulated Fusion using Transformer for Linguistic-Acoustic Emotion
Recognition
- URL: http://arxiv.org/abs/2010.02057v1
- Date: Mon, 5 Oct 2020 14:46:20 GMT
- Title: Modulated Fusion using Transformer for Linguistic-Acoustic Emotion
Recognition
- Authors: Jean-Benoit Delbrouck and No\'e Tits and St\'ephane Dupont
- Abstract summary: This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis.
Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field.
- Score: 7.799182201815763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to bring a new lightweight yet powerful solution for the task
of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two
architectures based on Transformers and modulation that combine the linguistic
and acoustic inputs from a wide range of datasets to challenge, and sometimes
surpass, the state-of-the-art in the field. To demonstrate the efficiency of
our models, we carefully evaluate their performances on the IEMOCAP, MOSI,
MOSEI and MELD dataset. The experiments can be directly replicated and the code
is fully open for future researches.
Related papers
- EEG-based Multimodal Representation Learning for Emotion Recognition [26.257531037300325]
We introduce a novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data.
Our framework is designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities.
arXiv Detail & Related papers (2024-10-29T01:35:17Z) - EMMA: Efficient Visual Alignment in Multi-Modal LLMs [56.03417732498859]
EMMA is a lightweight cross-modality module designed to efficiently fuse visual and textual encodings.
EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations.
arXiv Detail & Related papers (2024-10-02T23:00:31Z) - Speech Emotion Recognition under Resource Constraints with Data Distillation [64.36799373890916]
Speech emotion recognition (SER) plays a crucial role in human-computer interaction.
The emergence of edge devices in the Internet of Things presents challenges in constructing intricate deep learning models.
We propose a data distillation framework to facilitate efficient development of SER models in IoT applications.
arXiv Detail & Related papers (2024-06-21T13:10:46Z) - AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features.
Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Exploring Attention Mechanisms for Multimodal Emotion Recognition in an
Emergency Call Center Corpus [4.256247917850421]
This paper explores the different fusion strategies of modality-specific models for emotion recognition.
We show that multimodal fusion brings an absolute gain of 4-9% with respect to either single modality.
Our experiments also suggest that for the real-life CEMO corpus, the audio component encodes more emotive information than the textual one.
arXiv Detail & Related papers (2023-06-12T13:43:20Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Improving the Generalizability of Text-Based Emotion Detection by
Leveraging Transformers with Psycholinguistic Features [27.799032561722893]
We propose approaches for text-based emotion detection that leverage transformer models (BERT and RoBERTa) in combination with Bidirectional Long Short-Term Memory (BiLSTM) networks trained on a comprehensive set of psycholinguistic features.
We find that the proposed hybrid models improve the ability to generalize to out-of-distribution data compared to a standard transformer-based approach.
arXiv Detail & Related papers (2022-12-19T13:58:48Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - A Transformer-based joint-encoding for Emotion Recognition and Sentiment
Analysis [8.927538538637783]
This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis.
In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities.
arXiv Detail & Related papers (2020-06-29T11:51:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.