Related papers: UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

URL: http://arxiv.org/abs/2211.11256v1
Date: Mon, 21 Nov 2022 08:46:01 GMT
Title: UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition
Authors: Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, Yongbin Li
Abstract summary: Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. We propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions.
Score: 32.34485263348587
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.

Related papers

Bridging Cognition and Emotion: Empathy-Driven Multimodal Misinformation Detection [56.644686934050576]
Social media has become a major conduit for information dissemination, yet it also facilitates the rapid spread of misinformation. Traditional misinformation detection methods primarily focus on surface-level features, overlooking the crucial roles of human empathy in the propagation process. We propose the Dual-Aspect Empathy Framework (DAE), which integrates cognitive and emotional empathy to analyze misinformation from both the creator and reader perspectives.
arXiv Detail & Related papers (2025-04-24T07:48:26Z)
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations [35.63053777817013]
GatedxLSTM is a novel multimodal Emotion Recognition in Conversation (ERC) model. It considers voice and transcripts of both the speaker and their conversational partner to identify the most influential sentences driving emotional shifts. It achieves state-of-the-art (SOTA) performance among open-source methods in four-class emotion classification.
arXiv Detail & Related papers (2025-03-26T18:46:18Z)
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions. Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones. Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks. But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z)
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts [64.02363948840333]
UMETTS is a novel framework that leverages emotional cues from multiple modalities to generate highly expressive and emotionally resonant speech. EP-Align employs contrastive learning to align emotional features across text, audio, and visual modalities, ensuring a coherent fusion of multimodal information. EMI-TTS integrates the aligned emotional embeddings with state-of-the-art TTS models to synthesize speech that accurately reflects the intended emotions.
arXiv Detail & Related papers (2024-04-29T03:19:39Z)
UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause [18.99103120856208]
We propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause. UniMEEC reformulates the MERC and MECPE tasks as mask prediction problems and unifies them with a causal prompt template. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks.
arXiv Detail & Related papers (2024-03-30T15:59:17Z)
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER) We begin with a brief introduction on widely used emotion representation models and affective modalities. We then summarize existing emotion annotation strategies and corresponding computational tasks. Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
Affective Image Content Analysis: Two Decades Review and New Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades. We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z)
A Circular-Structured Representation for Visual Emotion Distribution Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning. To be specific, we first construct an Emotion Circle to unify any emotional state within it. On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z)
Multi-Task Learning and Adapted Knowledge Models for Emotion-Cause Extraction [18.68808042388714]
We present solutions that tackle both emotion recognition and emotion cause detection in a joint fashion. Considering that common-sense knowledge plays an important role in understanding implicitly expressed emotions, we propose novel methods. We show performance improvement on both tasks when including common-sense reasoning and a multitask framework.
arXiv Detail & Related papers (2021-06-17T20:11:04Z)
A Multi-Componential Approach to Emotion Recognition and the Effect of Personality [0.0]
This paper applies a componential framework with a data-driven approach to characterize emotional experiences evoked during movie watching. The results suggest that differences between various emotions can be captured by a few (at least 6) latent dimensions. Results show that a componential model with a limited number of descriptors is still able to predict the level of experienced discrete emotion.
arXiv Detail & Related papers (2020-10-22T01:27:23Z)
COSMIC: COmmonSense knowledge for eMotion Identification in Conversations [95.71018134363976]
We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations. We show that COSMIC achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets.
arXiv Detail & Related papers (2020-10-06T15:09:38Z)
Temporal aggregation of audio-visual modalities for emotion recognition [0.5352699766206808]
We propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. Our proposed method outperforms other methods from the literature and human accuracy rating.
arXiv Detail & Related papers (2020-07-08T18:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.