Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment
Analysis
- URL: http://arxiv.org/abs/2208.03051v1
- Date: Fri, 5 Aug 2022 09:07:58 GMT
- Title: Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment
Analysis
- Authors: Jia Li, Ziyang Zhang, Junjie Lang, Yueqi Jiang, Liuwei An, Peng Zou,
Yangyang Xu, Sheng Gao, Jie Lin, Chunxiao Fan, Xiao Sun, Meng Wang
- Abstract summary: We present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges.
The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilising different modalities and data sets.
- Score: 31.097398034974436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present our solutions for the Multimodal Sentiment Analysis
Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress
Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions
and multimodal emotional stress utilising different modalities and data sets.
In our work, different kinds of multimodal features are extracted, including
acoustic, visual, text and biological features. These features are fused by
TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1)
several new audio features, facial expression features and paragraph-level text
embeddings are extracted for accuracy improvement. 2) we substantially improve
the accuracy and reliability for multimodal sentiment prediction by mining and
blending the multimodal features. 3) effective data augmentation strategies are
applied in model training to alleviate the problem of sample imbalance and
prevent the model form learning biased subject characters. For the MuSe-Humor
sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction
sub-challenge, the Pearson's Correlations Coefficient of our approach on the
test set is 0.3879, which outperforms all other participants. For the
MuSe-Stress sub-challenge, our approach outperforms the baseline in both
arousal and valence on the test dataset, reaching a final combined result of
0.5151.
Related papers
- The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition [64.5207572897806]
The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems.
In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals.
The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor dataset.
arXiv Detail & Related papers (2024-06-11T22:26:20Z) - Exploiting Diverse Feature for Multimodal Sentiment Analysis [40.39627083212711]
We present our solution to the MuSe-Personalisation sub-challenge in the MuSe 2023 Multimodal Sentiment Analysis Challenge.
Considering different people have personal characteristics, the main challenge of this task is how to build robustness feature presentation for sentiment prediction.
arXiv Detail & Related papers (2023-08-25T15:06:14Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked
Emotions, Cross-Cultural Humour, and Personalisation [69.13075715686622]
MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems.
MuSe 2023 seeks to bring together a broad audience from different research communities.
arXiv Detail & Related papers (2023-05-05T08:53:57Z) - Hybrid Multimodal Fusion for Humor Detection [16.178078156094067]
We present our solution to the MuSe-Humor sub-challenge of the Multimodal Emotional Challenge (MuSe) 2022.
The goal of the MuSe-Humor sub-challenge is to detect humor and calculate AUC from audiovisual recordings of German football Bundesliga press conferences.
arXiv Detail & Related papers (2022-09-24T07:45:04Z) - The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional
Reactions, and Stress [71.06453250061489]
The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to multimodal sentiment and emotion recognition.
For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities; and (iii) the Ulm-Trier Social Stress Test dataset comprising of audio-visual data labelled with continuous emotion values of people in stressful dispositions.
arXiv Detail & Related papers (2022-06-23T13:34:33Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.