SYNTHEMPATHY: A Scalable Empathy Corpus Generated Using LLMs Without Any Crowdsourcing
- URL: http://arxiv.org/abs/2502.17857v1
- Date: Tue, 25 Feb 2025 05:07:27 GMT
- Title: SYNTHEMPATHY: A Scalable Empathy Corpus Generated Using LLMs Without Any Crowdsourcing
- Authors: Run Chen, Jun Shin, Julia Hirschberg,
- Abstract summary: We propose a data generation framework for developing a large corpus containing 105k empathetic responses to real-life situations.<n>A base Mistral 7B model fine-tuned on our SYNTHEMPATHY corpus exhibits an increase in the average empathy score.
- Score: 4.405248499280186
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Previous research has shown that humans are more receptive towards language models that that exhibit empathetic behavior. While empathy is essential for developing helpful dialogue agents, very few large corpora containing empathetic dialogues are available for fine-tune LLMs. The few existing corpora have largely relied on crowdsourcing to simulate empathetic conversations, a process that is expensive, time-consuming, and not scalable to larger datasets. We propose a data generation framework for developing SYNTHEMPATHY, a large corpus containing 105k empathetic responses to real-life situations compiled through LLM generation. A base Mistral 7B model fine-tuned on our SYNTHEMPATHY corpus exhibits an increase in the average empathy score.
Related papers
- Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models [19.450298798183166]
Empathetic response generation is a crucial task for creating more human-like and supportive conversational agents.<n>Existing methods face a core trade-off between the analytical depth of specialized models and the generative fluency of Large Language Models.<n>We propose TRACE, a novel framework that models empathy as a structured cognitive process by decomposing the task into a pipeline for analysis and synthesis.
arXiv Detail & Related papers (2025-09-26T04:20:37Z) - Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models [38.5764934392601]
We propose Emotion Omni, a model that understands emotional content in user speech and generates empathetic responses.<n>Emotion Omni achieves comparable instruction-following ability without large-scale pretraining, while surpassing existing models in speech quality.
arXiv Detail & Related papers (2025-08-26T03:54:39Z) - OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model [47.84522683404745]
We present OpenS2S, a fully open-source, transparent and end-to-end LSLM designed to enable empathetic speech interactions.<n>Based on our empathetic speech-to-text model BLSP-Emo, OpenS2S employs a streaming interleaved decoding architecture to achieve low-latency speech generation.<n>By leveraging large language models to generate empathetic content and controllable text-to-speech systems, we construct a scalable training corpus with rich paralinguistic diversity.
arXiv Detail & Related papers (2025-07-07T16:31:37Z) - Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification [56.974545305472304]
Most datasets for sentiment analysis lack context in which an opinion was expressed, often crucial for emotion understanding, and are mainly limited by a few emotion categories.
We design an LLM-based data synthesis pipeline and leverage a large model, Mistral-7b, for the generation of training examples for more accessible, lightweight BERT-type encoder models.
We show that Emo Pillars models are highly adaptive to new domains when tuned to specific tasks such as GoEmotions, ISEAR, IEMOCAP, and EmoContext, reaching the SOTA performance on the first three.
arXiv Detail & Related papers (2025-04-23T16:23:17Z) - Towards Anthropomorphic Conversational AI Part I: A Practical Framework [49.62013440962072]
We introduce a multi- module framework designed to replicate the key aspects of human intelligence involved in conversations.
In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning.
arXiv Detail & Related papers (2025-02-28T03:18:39Z) - Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [89.50691075011429]
Slow-thinking reasoning systems have garnered widespread attention by scaling the thinking time during inference.<n>There is also growing interest in adapting this capability to multimodal large language models (MLLMs)<n>In this paper, we explore a straightforward approach by fine-tuning a capable MLLM with a small amount of textual long-form thought data.<n>We find that these long-form reasoning processes, expressed in natural language, can be effectively transferred to MLLMs.
arXiv Detail & Related papers (2025-01-03T17:14:16Z) - OmniBench: Towards The Future of Universal Omni-Language Models [63.16606414452612]
We introduce OmniBench, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously.
Our main findings reveal that most OLMs exhibit critical limitations in instruction-following and reasoning capabilities within tri-modal contexts.
To address this gap, we curate an instruction tuning dataset of 84.5K training samples, OmniInstruct, for training OLMs to adapt to multimodal contexts.
arXiv Detail & Related papers (2024-09-23T17:59:05Z) - Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue [25.89926022671521]
We generate a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset.
We find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties.
arXiv Detail & Related papers (2024-09-12T18:00:18Z) - APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation [71.26755736617478]
Empathetic response generation is designed to comprehend the emotions of others.
We develop a framework that combines retrieval augmentation and emotional support strategy integration.
Our framework can enhance the empathy ability of LLMs from both cognitive and affective empathy perspectives.
arXiv Detail & Related papers (2024-07-23T02:23:37Z) - EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization [9.934277461349696]
Empathetic response generation is a desirable aspect of conversational agents.
We propose a novel approach where we construct theory-driven preference datasets based on emotion grounding.
We show that LLMs can be aligned for empathetic response generation by preference optimization while retaining their general performance.
arXiv Detail & Related papers (2024-06-27T10:41:22Z) - AI-native Memory: A Pathway from LLMs Towards AGI [25.19572633670963]
Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI)
We envision a pathway from LLMs to AGI through the integration of emphmemory.
As an intermediate stage, the memory will likely be in the form of natural language descriptions.
arXiv Detail & Related papers (2024-06-26T12:51:37Z) - Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)<n>It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.<n>The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - Large Language Models Produce Responses Perceived to be Empathic [40.38391275905264]
Large Language Models (LLMs) generate empathic messages in response to posts describing common life experiences.
We showed human raters a variety of responses written by several models, and had people rate these responses on how empathic they seemed to be.
We found that LLM-generated responses were consistently rated as more empathic than human-written responses.
arXiv Detail & Related papers (2024-03-26T23:14:34Z) - Harnessing Large Language Models' Empathetic Response Generation
Capabilities for Online Mental Health Counselling Support [1.9336815376402723]
Large Language Models (LLMs) have demonstrated remarkable performance across various information-seeking and reasoning tasks.
This study sought to examine LLMs' capability to generate empathetic responses in conversations that emulate those in a mental health counselling setting.
We selected five LLMs: version 3.5 and version 4 of the Generative Pre-training (GPT), Vicuna FastChat-T5, Pathways Language Model (PaLM) version 2, and Falcon-7B-Instruct.
arXiv Detail & Related papers (2023-10-12T03:33:06Z) - Exemplars-guided Empathetic Response Generation Controlled by the
Elements of Human Communication [88.52901763928045]
We propose an approach that relies on exemplars to cue the generative model on fine stylistic properties that signal empathy to the interlocutor.
We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics.
arXiv Detail & Related papers (2021-06-22T14:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.