RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
- URL: http://arxiv.org/abs/2507.03112v1
- Date: Thu, 03 Jul 2025 18:33:18 GMT
- Title: RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
- Authors: Peisong Wang, Ruotian Ma, Bang Zhang, Xingyu Chen, Zhiwei He, Kang Luo, Qingsong Lv, Qingxuan Jiang, Zheng Xie, Shanyi Wang, Yuan Li, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li,
- Abstract summary: Large language models (LLMs) excel at logical and algorithmic reasoning, yet their emotional intelligence (EQ) still lags far behind their cognitive prowess.<n>We introduce RLVER, the first end-to-end reinforcement learning framework that leverages verifiable emotion rewards from simulated users.<n>Our results show that RLVER is a practical route toward emotionally intelligent and broadly capable language agents.
- Score: 67.46032287312339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) excel at logical and algorithmic reasoning, yet their emotional intelligence (EQ) still lags far behind their cognitive prowess. While reinforcement learning from verifiable rewards (RLVR) has advanced in other domains, its application to dialogue-especially for emotional intelligence-remains underexplored. In this work, we introduce RLVER, the first end-to-end reinforcement learning framework that leverages verifiable emotion rewards from simulated users to cultivate higher-order empathetic abilities in LLMs. Within this framework, self-consistent affective simulated users engage in dialogue rollouts and produce deterministic emotion scores during conversations, serving as reward signals to guide the LLM's learning. Fine-tuning publicly available Qwen2.5-7B-Instruct model with PPO boosts its Sentient-Benchmark score from 13.3 to 79.2 while largely preserving mathematical and coding competence. Extensive experiments reveal that: (i) RLVER consistently improves multiple dialogue capabilities; (ii) Thinking and non-thinking models show distinct trends--thinking models excel in empathy and insight, while non-thinking models favor action; (iii) GRPO often yields stable gains, while PPO can push certain capabilities to a higher ceiling; (iv) More challenging environments are not always better-moderate ones can yield stronger outcomes. Our results show that RLVER is a practical route toward emotionally intelligent and broadly capable language agents.
Related papers
- MeRF: Motivation-enhanced Reinforcement Finetuning for Large Reasoning Models [95.6332110724999]
Motivation-enhanced Reinforcement Finetuning (MeRF) is an intuitive yet effective method enhancing reinforcement learning of Large Language Models (LLMs)<n>MeRF directly injects the reward specification into the prompt, which serves as an in-context motivation for model to improve its responses with awareness of the optimization objective.<n> Empirical evaluations on the Knights and Knaves(K&K) logic puzzle reasoning benchmark demonstrate that textttMeRF achieves substantial performance gains over baselines.
arXiv Detail & Related papers (2025-06-23T10:37:57Z) - Learning to Reason without External Rewards [100.27210579418562]
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision.<n>We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data.<n>We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal.
arXiv Detail & Related papers (2025-05-26T07:01:06Z) - Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z) - AI with Emotions: Exploring Emotional Expressions in Large Language Models [0.0]
Large Language Models (LLMs) play role-play as agents answering questions with specified emotional states.<n>Russell's Circumplex model characterizes emotions along the sleepy-activated (arousal) and pleasure-displeasure (valence) axes.<n> evaluation showed that the emotional states of the generated answers were consistent with the specifications.
arXiv Detail & Related papers (2025-04-20T18:49:25Z) - Enhancing Collective Intelligence in Large Language Models Through Emotional Integration [0.2812395851874055]
This research investigates the integration of emotional diversity into Large Language Models (LLMs) to enhance collective intelligence.<n>Inspired by the human wisdom of crowds phenomenon, we fine-tuned the DarkIdol-Llama-3.1-8B model using Google's GoEmotions dataset and Low-Rank Adaptation (LoRA) to simulate emotionally diverse responses.
arXiv Detail & Related papers (2025-03-05T23:42:48Z) - AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models [18.482881562645264]
This study is the first to explore the potential of Large Language Models (LLMs) in recognizing ambiguous emotions.<n>We design zero-shot and few-shot prompting and incorporate past dialogue as context information for ambiguous emotion recognition.
arXiv Detail & Related papers (2024-09-26T23:25:21Z) - An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language
Model Game Agents [0.40964539027092906]
This study tests the capabilities of large language models to solve emotional intelligence tasks and to simulate emotions.
It presents and evaluates a new chain-of-emotion architecture for emotion simulation within video games, based on psychological appraisal research.
arXiv Detail & Related papers (2023-09-10T16:55:49Z) - Large Language Models Understand and Can be Enhanced by Emotional
Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli.
Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts.
Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.