Related papers: Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

URL: http://arxiv.org/abs/2509.21950v1
Date: Fri, 26 Sep 2025 06:30:39 GMT
Title: Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Authors: Daiqing Wu, Dongbao Yang, Sicheng Zhao, Can Ma, Yu Zhou,
Abstract summary: We argue that this inconsistency stems partly from constraints in existing evaluation methods.<n>We propose an Emotion Statement Judgment task that overcomes these constraints.<n>We devise an automated pipeline that efficiently constructs emotion-centric statements with minimal human effort.
Score: 29.502292089901825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Multimodal Large Language Models (MLLMs) have achieved exceptional performance across diverse tasks, continually surpassing previous expectations regarding their capabilities. Nevertheless, their proficiency in perceiving emotions from images remains debated, with studies yielding divergent results in zero-shot scenarios. We argue that this inconsistency stems partly from constraints in existing evaluation methods, including the oversight of plausible responses, limited emotional taxonomies, neglect of contextual factors, and labor-intensive annotations. To facilitate customized visual emotion evaluation for MLLMs, we propose an Emotion Statement Judgment task that overcomes these constraints. Complementing this task, we devise an automated pipeline that efficiently constructs emotion-centric statements with minimal human effort. Through systematically evaluating prevailing MLLMs, our study showcases their stronger performance in emotion interpretation and context-based emotion judgment, while revealing relative limitations in comprehending perception subjectivity. When compared to humans, even top-performing MLLMs like GPT4o demonstrate remarkable performance gaps, underscoring key areas for future improvement. By developing a fundamental evaluation framework and conducting a comprehensive MLLM assessment, we hope this work contributes to advancing emotional intelligence in MLLMs. Project page: https://github.com/wdqqdw/MVEI.

Related papers

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models [62.3977734456669]
We propose Reflective Reinforcement Learning for Emotional Reasoning (EMO-R3), a framework designed to enhance the emotional reasoning ability of Multimodal Large Language Models (MLLMs)<n>We introduce Structured Emotional Thinking to guide the model to perform step-by-step emotional reasoning in a structured and interpretable manner, and design a Reflective Emotional Reward that enables the model to re-evaluate its reasoning based on visual-text consistency and emotional coherence.<n>EMO-R3 significantly improves both the interpretability and emotional intelligence of MLLMs, achieving superior performance across multiple visual emotional understanding benchmarks.
arXiv Detail & Related papers (2026-02-27T08:42:52Z)
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier [53.55996102181836]
We propose the Emotional Rationale Verifier (ERV) and an Explanation Reward.<n>Our method guides the model to produce reasoning that is explicitly consistent with the target emotion.<n>We show that our approach not only enhances alignment between explanation and prediction but also empowers MLLMs to deliver emotionally coherent, trustworthy interactions.
arXiv Detail & Related papers (2025-10-27T16:40:17Z)
Fluent but Unfeeling: The Emotional Blind Spots of Language Models [1.248728117157669]
A critical gap remains in evaluating whether Large Language Models (LLMs) align with human emotions at a fine-grained level.<n>We introduce Express, a benchmark dataset curated from Reddit communities featuring 251 fine-grained, self-disclosed emotion labels.<n>Our comprehensive evaluation framework examines predicted emotion terms and decomposes them into eight basic emotions using established emotion theories.
arXiv Detail & Related papers (2025-09-11T16:31:13Z)
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models [108.61337743051483]
We present MME-Emotion, a systematic benchmark that assesses both emotional understanding and reasoning capabilities of MLLMs.<n>MME-Emotion contains over 6,000 curated video clips with task-specific questioning-answering (QA) pairs, spanning broad scenarios to formulate eight emotional tasks.<n>It incorporates a holistic evaluation suite with hybrid metrics for emotion recognition and reasoning, analyzed through a multi-agent system framework.
arXiv Detail & Related papers (2025-08-11T03:14:55Z)
Don't Get Too Excited -- Eliciting Emotions in LLMs [1.8399318639816038]
This paper investigates the challenges of affect control in large language models (LLMs)<n>We evaluate state-of-the-art open-weight LLMs to assess their affective expressive range.<n>We quantify the models' capacity to express a wide spectrum of emotions and how they fluctuate during interactions.
arXiv Detail & Related papers (2025-03-04T10:06:41Z)
Evaluating Vision-Language Models for Emotion Recognition [1.7409710986849658]
We present the first comprehensive evaluation of Large Vision-Language Models (VLMs) for recognizing evoked emotions from images.<n>Through several experiments, we demonstrate important factors that emotion recognition performance depends on, and also characterize the various errors made by VLMs in the process.
arXiv Detail & Related papers (2025-02-08T18:25:31Z)
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents [57.4686961979566]
EmbodiedEval is a comprehensive and interactive evaluation benchmark for MLLMs with embodied tasks.<n>It covers a broad spectrum of existing embodied AI tasks with significantly enhanced diversity.<n>We evaluated the state-of-the-art MLLMs on EmbodiedEval and found that they have a significant shortfall compared to human level on embodied tasks.
arXiv Detail & Related papers (2025-01-21T03:22:10Z)
Retrieving Implicit and Explicit Emotional Events Using Large Language Models [4.245183693179267]
Large language models (LLMs) have garnered significant attention in recent years due to their impressive performance.<n>This study investigates LLMs' emotion retrieval capabilities in commonsense.
arXiv Detail & Related papers (2024-10-24T19:56:28Z)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks. But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology. We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z)
Large Language Models Understand and Can be Enhanced by Emotional Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli. Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts. Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.