UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion
Recognition
- URL: http://arxiv.org/abs/2211.11256v1
- Date: Mon, 21 Nov 2022 08:46:01 GMT
- Title: UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion
Recognition
- Authors: Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, Yongbin Li
- Abstract summary: Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors.
We propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models.
We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions.
- Score: 32.34485263348587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal sentiment analysis (MSA) and emotion recognition in conversation
(ERC) are key research topics for computers to understand human behaviors. From
a psychological perspective, emotions are the expression of affect or feelings
during a short period, while sentiments are formed and held for a longer
period. However, most existing works study sentiment and emotion separately and
do not fully exploit the complementary knowledge behind the two. In this paper,
we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that
unifies MSA and ERC tasks from features, labels, and models. We perform
modality fusion at the syntactic and semantic levels and introduce contrastive
learning between modalities and samples to better capture the difference and
consistency between sentiments and emotions. Experiments on four public
benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the
effectiveness of the proposed method and achieve consistent improvements
compared with state-of-the-art methods.
Related papers
- MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause [18.99103120856208]
We propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause.
UniMEEC reformulates the MERC and MECPE tasks as mask prediction problems and unifies them with a causal prompt template.
Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks.
arXiv Detail & Related papers (2024-03-30T15:59:17Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Affective Image Content Analysis: Two Decades Review and New
Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades.
We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence.
We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Multi-Task Learning and Adapted Knowledge Models for Emotion-Cause
Extraction [18.68808042388714]
We present solutions that tackle both emotion recognition and emotion cause detection in a joint fashion.
Considering that common-sense knowledge plays an important role in understanding implicitly expressed emotions, we propose novel methods.
We show performance improvement on both tasks when including common-sense reasoning and a multitask framework.
arXiv Detail & Related papers (2021-06-17T20:11:04Z) - A Multi-Componential Approach to Emotion Recognition and the Effect of
Personality [0.0]
This paper applies a componential framework with a data-driven approach to characterize emotional experiences evoked during movie watching.
The results suggest that differences between various emotions can be captured by a few (at least 6) latent dimensions.
Results show that a componential model with a limited number of descriptors is still able to predict the level of experienced discrete emotion.
arXiv Detail & Related papers (2020-10-22T01:27:23Z) - COSMIC: COmmonSense knowledge for eMotion Identification in
Conversations [95.71018134363976]
We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations.
We show that COSMIC achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets.
arXiv Detail & Related papers (2020-10-06T15:09:38Z) - Temporal aggregation of audio-visual modalities for emotion recognition [0.5352699766206808]
We propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality.
Our proposed method outperforms other methods from the literature and human accuracy rating.
arXiv Detail & Related papers (2020-07-08T18:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.