Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
- URL: http://arxiv.org/abs/2504.07521v2
- Date: Thu, 17 Apr 2025 09:34:26 GMT
- Title: Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
- Authors: Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua,
- Abstract summary: We propose Emotion Interpretation (EI), focusing on causal factors that drive emotional responses.<n>Unlike traditional emotion recognition, EI tasks require reasoning about triggers instead of mere labeling.<n>We present EIBench, a large-scale benchmark encompassing 1,615 basic EI samples and 50 complex EI samples.
- Score: 35.24458725308099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing emotion analysis emphasizes which emotion arises (e.g., happy, sad, angry) but neglects the deeper why. We propose Emotion Interpretation (EI), focusing on causal factors-whether explicit (e.g., observable objects, interpersonal interactions) or implicit (e.g., cultural context, off-screen events)-that drive emotional responses. Unlike traditional emotion recognition, EI tasks require reasoning about triggers instead of mere labeling. To facilitate EI research, we present EIBench, a large-scale benchmark encompassing 1,615 basic EI samples and 50 complex EI samples featuring multifaceted emotions. Each instance demands rationale-based explanations rather than straightforward categorization. We further propose a Coarse-to-Fine Self-Ask (CFSA) annotation pipeline, which guides Vision-Language Models (VLLMs) through iterative question-answer rounds to yield high-quality labels at scale. Extensive evaluations on open-source and proprietary large language models under four experimental settings reveal consistent performance gaps-especially for more intricate scenarios-underscoring EI's potential to enrich empathetic, context-aware AI applications. Our benchmark and methods are publicly available at: https://github.com/Lum1104/EIBench, offering a foundation for advanced multimodal causal analysis and next-generation affective computing.
Related papers
- Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification [56.974545305472304]
Most datasets for sentiment analysis lack context in which an opinion was expressed, often crucial for emotion understanding, and are mainly limited by a few emotion categories.
We design an LLM-based data synthesis pipeline and leverage a large model, Mistral-7b, for the generation of training examples for more accessible, lightweight BERT-type encoder models.
We show that Emo Pillars models are highly adaptive to new domains when tuned to specific tasks such as GoEmotions, ISEAR, IEMOCAP, and EmoContext, reaching the SOTA performance on the first three.
arXiv Detail & Related papers (2025-04-23T16:23:17Z) - AI with Emotions: Exploring Emotional Expressions in Large Language Models [0.0]
Large Language Models (LLMs) play role-play as agents answering questions with specified emotional states.
Russell's Circumplex model characterizes emotions along the sleepy-activated (arousal) and pleasure-displeasure (valence) axes.
evaluation showed that the emotional states of the generated answers were consistent with the specifications.
arXiv Detail & Related papers (2025-04-20T18:49:25Z) - EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models [27.195518991292488]
EmoBench-M is a novel benchmark designed to evaluate the emotional intelligence (EI) capability of Multimodal large language models (MLLMs)<n> Evaluations of both open-source and closed-source MLLMs on EmoBench-M reveal a significant performance gap between them and humans.
arXiv Detail & Related papers (2025-02-06T18:13:35Z) - Generative Emotion Cause Explanation in Multimodal Conversations [23.39751445330256]
We propose a new task, textbfMultimodal textbfConversation textbfEmotion textbfCause textbfExplanation (MCECE)
It aims to generate a detailed explanation of the emotional cause to the target utterance within a multimodal conversation scenario.
A novel approach, FAME-Net, is proposed, that harnesses the power of Large Language Models (LLMs) to analyze visual data and accurately interpret the emotions conveyed through facial expressions in videos.
arXiv Detail & Related papers (2024-11-01T09:16:30Z) - PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - EmoBench: Evaluating the Emotional Intelligence of Large Language Models [73.60839120040887]
EmoBench is a benchmark that draws upon established psychological theories and proposes a comprehensive definition for machine Emotional Intelligence (EI)
EmoBench includes a set of 400 hand-crafted questions in English and Chinese, which are meticulously designed to require thorough reasoning and understanding.
Our findings reveal a considerable gap between the EI of existing Large Language Models and the average human, highlighting a promising direction for future research.
arXiv Detail & Related papers (2024-02-19T11:48:09Z) - ECQED: Emotion-Cause Quadruple Extraction in Dialogs [37.66816413841564]
We present Emotion-Cause Quadruple Extraction in Dialogs (ECQED), which requires detecting emotion-cause utterance pairs and emotion and cause types.
We show that introducing the fine-grained emotion and cause features evidently helps better dialog generation.
arXiv Detail & Related papers (2023-06-06T19:04:30Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.