EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
- URL: http://arxiv.org/abs/2505.11405v1
- Date: Fri, 16 May 2025 16:14:08 GMT
- Title: EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
- Authors: Bohao Xing, Xin Liu, Guoying Zhao, Chengyu Liu, Xiaolan Fu, Heikki Kälviäinen,
- Abstract summary: We introduce EmotionHallucer, the first benchmark for detecting and analyzing emotion hallucinations in MLLMs.<n>Building on this, we assess emotion hallucinations from two dimensions: emotion psychology knowledge and real-world multimodal perception.<n>We propose the PEP-MEK framework, which yields an average improvement of 9.90% in emotion hallucination detection across selected models.
- Score: 17.710835703681873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion understanding is a critical yet challenging task. Recent advances in Multimodal Large Language Models (MLLMs) have significantly enhanced their capabilities in this area. However, MLLMs often suffer from hallucinations, generating irrelevant or nonsensical content. To the best of our knowledge, despite the importance of this issue, there has been no dedicated effort to evaluate emotion-related hallucinations in MLLMs. In this work, we introduce EmotionHallucer, the first benchmark for detecting and analyzing emotion hallucinations in MLLMs. Unlike humans, whose emotion understanding stems from the interplay of biology and social learning, MLLMs rely solely on data-driven learning and lack innate emotional instincts. Fortunately, emotion psychology provides a solid foundation of knowledge about human emotions. Building on this, we assess emotion hallucinations from two dimensions: emotion psychology knowledge and real-world multimodal perception. To support robust evaluation, we utilize an adversarial binary question-answer (QA) framework, which employs carefully crafted basic and hallucinated pairs to assess the emotion hallucination tendencies of MLLMs. By evaluating 38 LLMs and MLLMs on EmotionHallucer, we reveal that: i) most current models exhibit substantial issues with emotion hallucinations; ii) closed-source models outperform open-source ones in detecting emotion hallucinations, and reasoning capability provides additional advantages; iii) existing models perform better in emotion psychology knowledge than in multimodal emotion perception. As a byproduct, these findings inspire us to propose the PEP-MEK framework, which yields an average improvement of 9.90% in emotion hallucination detection across selected models. Resources will be available at https://github.com/xxtars/EmotionHallucer.
Related papers
- AI shares emotion with humans across languages and cultures [12.530921452568291]
We assess human-AI emotional alignment across linguistic-cultural groups and model-families.<n>Our analyses reveal that LLM-derived emotion spaces are structurally congruent with human perception.<n>We show that model expressions can be stably and naturally modulated across distinct emotion categories.
arXiv Detail & Related papers (2025-06-11T14:42:30Z) - AI with Emotions: Exploring Emotional Expressions in Large Language Models [0.0]
Large Language Models (LLMs) play role-play as agents answering questions with specified emotional states.<n>Russell's Circumplex model characterizes emotions along the sleepy-activated (arousal) and pleasure-displeasure (valence) axes.<n> evaluation showed that the emotional states of the generated answers were consistent with the specifications.
arXiv Detail & Related papers (2025-04-20T18:49:25Z) - EmoVerse: Exploring Multimodal Large Language Models for Sentiment and Emotion Understanding [5.3848462080869215]
We introduce Emotion Universe (EmoVerse), an MLLM designed to handle a broad spectrum of sentiment and emotion-related tasks.<n>EmoVerse is capable of deeply analyzing the underlying causes of emotional states.<n>We also introduce the Affective Multitask (AMT) dataset.
arXiv Detail & Related papers (2024-12-11T02:55:00Z) - MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models [18.482881562645264]
This study is the first to explore the potential of Large Language Models (LLMs) in recognizing ambiguous emotions.<n>We design zero-shot and few-shot prompting and incorporate past dialogue as context information for ambiguous emotion recognition.
arXiv Detail & Related papers (2024-09-26T23:25:21Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - The Good, The Bad, and Why: Unveiling Emotions in Generative AI [73.94035652867618]
We show that EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it.
EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain.
arXiv Detail & Related papers (2023-12-18T11:19:45Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - Large Language Models Understand and Can be Enhanced by Emotional
Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli.
Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts.
Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z) - Language-Specific Representation of Emotion-Concept Knowledge Causally
Supports Emotion Inference [44.126681295827794]
This study used a form of artificial intelligence known as large language models (LLMs) to assess whether language-based representations of emotion causally contribute to the AI's ability to generate inferences about the emotional meaning of novel situations.
Our findings provide a proof-in-concept that even a LLM can learn about emotions in the absence of sensory-motor representations and highlight the contribution of language-derived emotion-concept knowledge for emotion inference.
arXiv Detail & Related papers (2023-02-19T14:21:33Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.