PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models
- URL: http://arxiv.org/abs/2601.10532v2
- Date: Fri, 16 Jan 2026 15:28:40 GMT
- Title: PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models
- Authors: Chengbing Wang, Wuqiang Zheng, Yang Zhang, Fengbin Zhu, Junyi Cheng, Yi Xie, Wenjie Wang, Fuli Feng,
- Abstract summary: Large Language Models (LLMs) are increasingly deployed in human-centric applications, yet they often fail to provide substantive emotional support.<n>We propose Psychology-grounded Empathetic Reward Modeling (PERM) to address this limitation.
- Score: 45.377102925731826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly deployed in human-centric applications, yet they often fail to provide substantive emotional support. While Reinforcement Learning (RL) has been utilized to enhance empathy of LLMs, existing reward models typically evaluate empathy from a single perspective, overlooking the inherently bidirectional interaction nature of empathy between the supporter and seeker as defined by Empathy Cycle theory. To address this limitation, we propose Psychology-grounded Empathetic Reward Modeling (PERM). PERM operationalizes empathy evaluation through a bidirectional decomposition: 1) Supporter perspective, assessing internal resonation and communicative expression; 2) Seeker perspective, evaluating emotional reception. Additionally, it incorporates a bystander perspective to monitor overall interaction quality. Extensive experiments on a widely-used emotional intelligence benchmark and an industrial daily conversation dataset demonstrate that PERM outperforms state-of-the-art baselines by over 10\%. Furthermore, a blinded user study reveals a 70\% preference for our approach, highlighting its efficacy in generating more empathetic responses. Our code, dataset, and models are available at https://github.com/ZhengWwwq/PERM.
Related papers
- Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning [20.717092979679553]
KardiaBench is a large-scale user-grounded benchmark comprising 178,080 QA pairs across 22,080 conversations anchored to 671 real-world profiles.<n>Kardia-R1 is a framework that trains models for interpretable, stepwise empathetic cognition.<n>Our dataset and model will be released at https://github.com/JhCircle/Kardia-R1.
arXiv Detail & Related papers (2025-12-01T04:54:03Z) - EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling [22.309957211042597]
Personality detection from text is commonly performed by analysing users' social media posts.<n>We propose a novel self-supervised framework, EmoPerso, which improves personality detection through emotion-aware modelling.
arXiv Detail & Related papers (2025-09-02T15:57:26Z) - Heartificial Intelligence: Exploring Empathy in Language Models [8.517406772939292]
Small and large language models consistently outperformed humans on cognitive empathy tasks.<n>Despite their cognitive strengths, both small and large language models showed significantly lower affective empathy compared to human participants.
arXiv Detail & Related papers (2025-07-30T14:09:33Z) - RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents [67.46032287312339]
Large language models (LLMs) excel at logical and algorithmic reasoning, yet their emotional intelligence (EQ) still lags far behind their cognitive prowess.<n>We introduce RLVER, the first end-to-end reinforcement learning framework that leverages verifiable emotion rewards from simulated users.<n>Our results show that RLVER is a practical route toward emotionally intelligent and broadly capable language agents.
arXiv Detail & Related papers (2025-07-03T18:33:18Z) - The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support [14.137398642966138]
This paper investigates the capacity of small language models to generate empathetic responses for individuals with PTSD.<n>Trauma-Informed Dialogue for Empathy (TIDE) is a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas.
arXiv Detail & Related papers (2025-05-21T03:32:46Z) - Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z) - Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs [65.93003087656754]
VisFactor is a benchmark that digitizes 20 vision-centric subtests from a well-established cognitive psychology assessment.<n>We evaluate 20 frontier Multimodal Large Language Models (MLLMs) from GPT, Gemini, Claude, LLaMA, Qwen, and SEED families.<n>The best-performing model achieves a score of only 25.19 out of 100, with consistent failures on tasks such as mental rotation, spatial relation inference, and figure-ground discrimination.
arXiv Detail & Related papers (2025-02-23T04:21:32Z) - Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions.
CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Towards Persona-Based Empathetic Conversational Models [58.65492299237112]
Empathetic conversational models have been shown to improve user satisfaction and task outcomes in numerous domains.
In Psychology, persona has been shown to be highly correlated to personality, which in turn influences empathy.
We propose a new task towards persona-based empathetic conversations and present the first empirical study on the impact of persona on empathetic responding.
arXiv Detail & Related papers (2020-04-26T08:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.