A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions
- URL: http://arxiv.org/abs/2401.00536v1
- Date: Sun, 31 Dec 2023 16:48:03 GMT
- Title: A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions
- Authors: Alex-R\u{a}zvan Ispas, Th\'eo Deschamps-Berger, Laurence Devillers
- Abstract summary: We propose a multi-task, multi-modal system that predicts categorical and dimensional emotions.
Results emphasise the importance of cross-regularisation between the two types of emotions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech emotion recognition (SER) has received a great deal of attention in
recent years in the context of spontaneous conversations. While there have been
notable results on datasets like the well known corpus of naturalistic dyadic
conversations, IEMOCAP, for both the case of categorical and dimensional
emotions, there are few papers which try to predict both paradigms at the same
time. Therefore, in this work, we aim to highlight the performance contribution
of multi-task learning by proposing a multi-task, multi-modal system that
predicts categorical and dimensional emotions. The results emphasise the
importance of cross-regularisation between the two types of emotions. Our
approach consists of a multi-task, multi-modal architecture that uses parallel
feature refinement through self-attention for the feature of each modality. In
order to fuse the features, our model introduces a set of learnable bridge
tokens that merge the acoustic and linguistic features with the help of
cross-attention. Our experiments for categorical emotions on 10-fold validation
yield results comparable to the current state-of-the-art. In our configuration,
our multi-task approach provides better results compared to learning each
paradigm separately. On top of that, our best performing model achieves a high
result for valence compared to the previous multi-task experiments.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task [3.489826905722736]
SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations.
This paper proposes models that tackle this task as an utterance labeling and a sequence labeling problem.
In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
arXiv Detail & Related papers (2024-04-02T16:32:49Z) - AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in
Group Conversations [39.79734528362605]
Multimodal Attention Network captures cross-modal interactions at various levels of spatial abstraction.
AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level.
arXiv Detail & Related papers (2024-01-26T19:17:05Z) - Joyful: Joint Modality Fusion and Graph Contrastive Learning for
Multimodal Emotion Recognition [18.571931295274975]
Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities.
Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue.
We propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful)
arXiv Detail & Related papers (2023-11-18T08:21:42Z) - MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts [92.76662894585809]
We introduce an approach to enhance multimodal models, which we call Multimodal Mixtures of Experts (MMoE)
MMoE is able to be applied to various types of models to gain improvement.
arXiv Detail & Related papers (2023-11-16T05:31:21Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Multitask Learning for Emotion and Personality Detection [17.029426018676997]
We build on the known correlation between personality traits and emotional behaviors, and propose a novel multitask learning framework, SoGMTL.
Our more computationally efficient CNN-based multitask model achieves the state-of-the-art performance across multiple famous personality and emotion datasets.
arXiv Detail & Related papers (2021-01-07T03:09:55Z) - Pedestrian Behavior Prediction via Multitask Learning and Categorical
Interaction Modeling [13.936894582450734]
We propose a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data.
We show that our model achieves state-of-the-art performance and improves trajectory and action prediction by up to 22% and 6% respectively.
arXiv Detail & Related papers (2020-12-06T15:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.