EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
- URL: http://arxiv.org/abs/2602.23802v1
- Date: Fri, 27 Feb 2026 08:42:52 GMT
- Title: EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
- Authors: Yiyang Fang, Wenke Huang, Pei Fu, Yihao Yang, Kehua Su, Zhenbo Luo, Jian Luan, Mang Ye,
- Abstract summary: We propose Reflective Reinforcement Learning for Emotional Reasoning (EMO-R3), a framework designed to enhance the emotional reasoning ability of Multimodal Large Language Models (MLLMs)<n>We introduce Structured Emotional Thinking to guide the model to perform step-by-step emotional reasoning in a structured and interpretable manner, and design a Reflective Emotional Reward that enables the model to re-evaluate its reasoning based on visual-text consistency and emotional coherence.<n>EMO-R3 significantly improves both the interpretability and emotional intelligence of MLLMs, achieving superior performance across multiple visual emotional understanding benchmarks.
- Score: 62.3977734456669
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual reasoning and understanding tasks but still struggle to capture the complexity and subjectivity of human emotions. Existing approaches based on supervised fine-tuning often suffer from limited generalization and poor interpretability, while reinforcement learning methods such as Group Relative Policy Optimization fail to align with the intrinsic characteristics of emotional cognition. To address these challenges, we propose Reflective Reinforcement Learning for Emotional Reasoning (EMO-R3), a framework designed to enhance the emotional reasoning ability of MLLMs. Specifically, we introduce Structured Emotional Thinking to guide the model to perform step-by-step emotional reasoning in a structured and interpretable manner, and design a Reflective Emotional Reward that enables the model to re-evaluate its reasoning based on visual-text consistency and emotional coherence. Extensive experiments demonstrate that EMO-R3 significantly improves both the interpretability and emotional intelligence of MLLMs, achieving superior performance across multiple visual emotional understanding benchmarks.
Related papers
- E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z) - A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction [50.05919688888947]
This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT)<n>IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision.<n> Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation.
arXiv Detail & Related papers (2026-01-08T14:07:30Z) - Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Language Models [6.810484095299127]
Emotional support is a core capability in human-AI interaction, with applications including psychological counseling, role play, and companionship.<n>Existing evaluations of large language models (LLMs) often rely on short, static dialogues and fail to capture the dynamic and long-term nature of emotional support.<n>Our framework constructs a large-scale benchmark consisting of 328 emotional contexts and 1,152 disturbance events, simulating realistic emotional shifts under evolving dialogue scenarios.
arXiv Detail & Related papers (2025-11-12T05:47:28Z) - Unraveling Emotions with Pre-Trained Models [40.463050040722855]
This work compares the effectiveness of fine-tuning and prompt engineering in emotion detection in three scenarios.<n> Experimental tests attain metrics above 70% with a fine-tuned pre-trained model for emotion recognition.<n>These advancements improve sentiment analysis, human-computer interaction, and understanding of user behavior across various domains.
arXiv Detail & Related papers (2025-10-22T15:13:52Z) - MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models [108.61337743051483]
We present MME-Emotion, a systematic benchmark that assesses both emotional understanding and reasoning capabilities of MLLMs.<n>MME-Emotion contains over 6,000 curated video clips with task-specific questioning-answering (QA) pairs, spanning broad scenarios to formulate eight emotional tasks.<n>It incorporates a holistic evaluation suite with hybrid metrics for emotion recognition and reasoning, analyzed through a multi-agent system framework.
arXiv Detail & Related papers (2025-08-11T03:14:55Z) - Emotion-Qwen: A Unified Framework for Emotion and Vision Understanding [26.36195886824082]
Emotion-Qwen is a unified multimodal framework designed to simultaneously enable robust emotion understanding and preserve general reasoning capabilities.<n>We develop the Video Emotion Reasoning dataset, a large-scale bilingual resource containing over 40K video clips annotated with detailed context-aware emotional descriptions.
arXiv Detail & Related papers (2025-05-10T16:15:26Z) - Don't Get Too Excited -- Eliciting Emotions in LLMs [1.8399318639816038]
This paper investigates the challenges of affect control in large language models (LLMs)<n>We evaluate state-of-the-art open-weight LLMs to assess their affective expressive range.<n>We quantify the models' capacity to express a wide spectrum of emotions and how they fluctuate during interactions.
arXiv Detail & Related papers (2025-03-04T10:06:41Z) - From Rational Answers to Emotional Resonance: The Role of Controllable Emotion Generation in Language Models [16.350658746140788]
Large language models (LLMs) struggle to express emotions in a consistent, controllable, and contextually appropriate manner.<n>We propose a controllable emotion generation framework based on Emotion Vectors (EVs)<n>Our method enables fine-grained, continuous modulation of emotional tone without any additional training or architectural modification.
arXiv Detail & Related papers (2025-02-06T13:38:57Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought [50.13429055093534]
Large Language Models (LLMs) have shown remarkable performance in various emotion recognition tasks.
We propose the Emotional Chain-of-Thought (ECoT) to enhance the performance of LLMs on various emotional generation tasks.
arXiv Detail & Related papers (2024-01-12T16:42:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.