The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
- URL: http://arxiv.org/abs/2505.15065v2
- Date: Sat, 20 Sep 2025 08:07:48 GMT
- Title: The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
- Authors: Suhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, Saeed Abdullah,
- Abstract summary: This paper investigates the capacity of small language models to generate empathetic responses for individuals with PTSD.<n>Trauma-Informed Dialogue for Empathy (TIDE) is a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas.
- Score: 14.137398642966138
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the capacity of small language models (0.5B-5B parameters) to generate empathetic responses for individuals with PTSD. We introduce Trauma-Informed Dialogue for Empathy (TIDE), a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas (https://huggingface.co/datasets/yenopoya/TIDE). Using frontier model outputs as ground truth, we evaluate eight small LLMs in zero-shot settings and after fine-tuning. Fine-tuning enhances empathetic capabilities, improving cosine similarity and perceived empathy, although gains vary across emotional scenarios and smaller models exhibit a "knowledge transfer ceiling." As expected, Claude Sonnet 3.5 consistently outperforms all models, but surprisingly, the smaller models often approach human-rated empathy levels. Demographic analyses showed that older adults favored responses that validated distress before offering support (p = .004), while graduate-educated users preferred emotionally layered replies in specific scenarios. Gender-based differences were minimal (p > 0.15), suggesting the feasibility of broadly empathetic model designs. This work offers insights into building resource-efficient, emotionally intelligent systems for mental health support.
Related papers
- Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue [53.95386201009769]
We introduce EmpathyEval, a descriptive natural-language-based evaluation model for assessing empathetic quality in spoken dialogues.<n>We propose ReEmpathy, an end-to-end Spoken Language Models that enhances empathetic dialogue through a novel Empathetic Self-Reflective Alternating Inference mechanism.
arXiv Detail & Related papers (2026-01-26T09:04:50Z) - PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models [45.377102925731826]
Large Language Models (LLMs) are increasingly deployed in human-centric applications, yet they often fail to provide substantive emotional support.<n>We propose Psychology-grounded Empathetic Reward Modeling (PERM) to address this limitation.
arXiv Detail & Related papers (2026-01-15T15:56:55Z) - Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models [92.93521294357058]
Narrative therapy helps individuals transform problematic life stories into empowering alternatives.<n>Current approaches lack realism in specialized psychotherapy and fail to capture therapeutic progression over time.<n>Int (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses.
arXiv Detail & Related papers (2025-07-27T11:52:09Z) - Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z) - Modeling Challenging Patient Interactions: LLMs for Medical Communication Training [39.67477471073807]
This study proposes the use of Large Language Models (LLMs) to simulate authentic patient communication styles.<n>We developed virtual patients (VPs) that embody nuanced emotional and conversational traits.<n>Medical professionals evaluated these VPs, rating their authenticity (accuser: $3.8 pm 1.0$; rationalizer: $3.7 pm 0.8$ on a 5-point Likert scale (from one to five)) and correctly identifying their styles.
arXiv Detail & Related papers (2025-03-28T09:04:10Z) - Investigating Large Language Models in Inferring Personality Traits from User Conversations [5.705775078773656]
Large Language Models (LLMs) are demonstrating remarkable human like capabilities across diverse domains.<n>This study evaluates whether GPT-4o and GPT-4o mini, can infer Big Five personality traits and generate BFI-10 item scores from user conversations.
arXiv Detail & Related papers (2025-01-13T18:09:58Z) - MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.<n>Privacy concerns limit the accessibility of personalized treatment data.<n>MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z) - Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories.
We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha)
Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z) - Multi-dimensional Evaluation of Empathetic Dialog Responses [4.580983642743026]
We propose a multi-dimensional empathy evaluation framework to measure both emphexpressed intents from the speaker's perspective and emphperceived empathy from the listener's perspective.
We find the two dimensions are inter-connected, while perceived empathy has high correlations with dialogue satisfaction levels.
arXiv Detail & Related papers (2024-02-18T00:32:33Z) - Response Generation for Cognitive Behavioral Therapy with Large Language
Models: Comparative Study with Socratic Questioning [6.400704401007114]
This study investigates the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality.
When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly.
arXiv Detail & Related papers (2024-01-29T08:53:41Z) - Harnessing Large Language Models' Empathetic Response Generation
Capabilities for Online Mental Health Counselling Support [1.9336815376402723]
Large Language Models (LLMs) have demonstrated remarkable performance across various information-seeking and reasoning tasks.
This study sought to examine LLMs' capability to generate empathetic responses in conversations that emulate those in a mental health counselling setting.
We selected five LLMs: version 3.5 and version 4 of the Generative Pre-training (GPT), Vicuna FastChat-T5, Pathways Language Model (PaLM) version 2, and Falcon-7B-Instruct.
arXiv Detail & Related papers (2023-10-12T03:33:06Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic
Response Generation [59.8935454665427]
Empathetic dialogue models usually consider only the affective aspect or treat cognition and affection in isolation.
We propose the CASE model for empathetic dialogue generation.
arXiv Detail & Related papers (2022-08-18T14:28:38Z) - EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner.
We propose a method based on a transformer pretrained language model (T5)
We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z) - World Trade Center responders in their own words: Predicting PTSD
symptom trajectories with AI-based language analyses of interviews [6.700088567524812]
This study tested the ability of AI-based language assessments to predict PTSD symptom trajectories among responders.
Cross-sectionally, greater depressive language (beta=0.32; p43) and first-person singular usage (beta=0.31; p44) were associated with increased symptom severity.
Longer words usage (beta=-0.36; p7) and longer words usage (beta=-0.36; p7) predicted improvement.
arXiv Detail & Related papers (2020-11-12T15:57:23Z) - LAXARY: A Trustworthy Explainable Twitter Analysis Model for
Post-Traumatic Stress Disorder Assessment [1.776746672434207]
We propose LAXARY (Linguistic Analysis-based Exaplainable Inquiry) model to detect and represent PTSD assessment of twitter users.
First, we employ clinically validated survey tools for collecting clinical PTSD assessment data from real twitter users.
Then, we use the PTSD Linguistic Dictionary along with machine learning model to fill up the survey tools towards detecting PTSD status and its intensity of corresponding twitter users.
arXiv Detail & Related papers (2020-03-16T20:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.