RE-LLM: Refining Empathetic Speech-LLM Responses by Integrating Emotion Nuance
- URL: http://arxiv.org/abs/2602.10716v1
- Date: Wed, 11 Feb 2026 10:23:44 GMT
- Title: RE-LLM: Refining Empathetic Speech-LLM Responses by Integrating Emotion Nuance
- Authors: Jing-Han Chen, Bo-Hao Su, Ya-Tse Wu, Chi-Chun Lee,
- Abstract summary: We propose RE-LLM, a speech-LLM integrating dimensional emotion embeddings and auxiliary learning.<n>Experiments show statistically significant gains in empathy metrics across three datasets.
- Score: 35.31585885627661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With generative AI advancing, empathy in human-AI interaction is essential. While prior work focuses on emotional reflection, emotional exploration, key to deeper engagement, remains overlooked. Existing LLMs rely on text which captures limited emotion nuances. To address this, we propose RE-LLM, a speech-LLM integrating dimensional emotion embeddings and auxiliary learning. Experiments show statistically significant gains in empathy metrics across three datasets. RE-LLM relatively improves the Emotional Reaction score by 14.79% and 6.76% compared to text-only and speech-LLM baselines on ESD. Notably, it raises the Exploration score by 35.42% and 3.91% on IEMOCAP, 139.28% and 9.83% on ESD, and 60.95% and 22.64% on MSP-PODCAST. It also boosts unweighted accuracy by 5.4% on IEMOCAP, 2.3% on ESD, and 6.9% on MSP-PODCAST in speech emotion recognition. These results highlight the enriched emotional understanding and improved empathetic response generation of RE-LLM.
Related papers
- Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs [6.14481021961242]
We propose decomposing ToM in language models by comparing steered versus baseline LLMs' activations.<n>We find improved performance on belief attribution tasks (32.5% to 46.7% accuracy) is mediated by activations processing emotional content.<n>This suggests that successful ToM abilities in LLMs are mediated by emotional understanding, not analytical reasoning.
arXiv Detail & Related papers (2025-11-19T21:56:00Z) - RLAIF-SPA: Optimizing LLM-based Emotional Speech Synthesis via RLAIF [23.474332076771308]
Text-To-Speech synthesis has achieved near-human quality in neutral speech, but emotional expressiveness remains a challenge.<n>We propose the RLAIF-SPA framework, incorporating a Reinforcement Learning from AI Feedback mechanism to employ Automatic Speech Recognition (ASR) and Large Language Model (LLM) techniques.<n>Experiments on the Libri Speech dataset show that RLAIF-SPA outperforms Chat-TTS, with a 26.1% reduction in WER, a 9.1% increase in SIM-O, and over 10% improvement in human evaluation.
arXiv Detail & Related papers (2025-10-16T12:40:37Z) - Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction [83.88591755871734]
EmoRAG is a system designed to detect perceived emotions in text for SemEval-2025 Task 11, Subtask A: Multi-label Emotion Detection.<n>We focus on predicting the perceived emotions of the speaker from a given text snippet, labeling it with emotions such as joy, sadness, fear, anger, surprise, and disgust.
arXiv Detail & Related papers (2025-06-04T19:41:24Z) - APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation [71.26755736617478]
Empathetic response generation is designed to comprehend the emotions of others.
We develop a framework that combines retrieval augmentation and emotional support strategy integration.
Our framework can enhance the empathy ability of LLMs from both cognitive and affective empathy perspectives.
arXiv Detail & Related papers (2024-07-23T02:23:37Z) - Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
We introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories.
This dataset enables models to learn from varied scenarios and generalize to real-world applications.
We propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders.
arXiv Detail & Related papers (2024-06-17T03:01:22Z) - Are Large Language Models More Empathetic than Humans? [14.18033127602866]
GPT-4 emerged as the most empathetic, marking approximately 31% increase in responses rated as "Good" compared to the human benchmark.
Some LLMs are significantly better at responding to specific emotions compared to others.
arXiv Detail & Related papers (2024-06-07T16:33:43Z) - Large Language Models Understand and Can be Enhanced by Emotional
Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli.
Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts.
Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z) - End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency
Call Centers Data Recordings [0.0]
End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches.
We first trained and tested it on the widely used corpus accessible by the community, IEMOCAP.
We then used the same architecture as the real life corpus, CEMO, composed of 440 dialogs (2h16m) from 485 speakers.
arXiv Detail & Related papers (2021-10-28T08:56:57Z) - Reinforcement Learning for Emotional Text-to-Speech Synthesis with
Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
We propose a new interactive training paradigm for ETTS, denoted as i-ETTS.
We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.