Related papers: The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

URL: http://arxiv.org/abs/2505.15065v1
Date: Wed, 21 May 2025 03:32:46 GMT
Title: The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
Authors: Suhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, Saeed Abdullah,
Abstract summary: We introduce TIDE, a dataset of 10,000 two-turn dialogues spanning 500 diverse PTSD client personas.<n>All scenarios and reference responses were reviewed for realism and trauma sensitivity by a clinical psychologist specializing in PTSD.<n>Our IRB-approved human evaluation and automatic metrics show that fine-tuning generally improves perceived empathy, but gains are highly scenario- and user-dependent.
Score: 10.942749627086476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Can small language models with 0.5B to 5B parameters meaningfully engage in trauma-informed, empathetic dialogue for individuals with PTSD? We address this question by introducing TIDE, a dataset of 10,000 two-turn dialogues spanning 500 diverse PTSD client personas and grounded in a three-factor empathy model: emotion recognition, distress normalization, and supportive reflection. All scenarios and reference responses were reviewed for realism and trauma sensitivity by a clinical psychologist specializing in PTSD. We evaluate eight small language models before and after fine-tuning, comparing their outputs to a frontier model (Claude Sonnet 3.5). Our IRB-approved human evaluation and automatic metrics show that fine-tuning generally improves perceived empathy, but gains are highly scenario- and user-dependent, with smaller models facing an empathy ceiling. Demographic analysis shows older adults value distress validation and graduate-educated users prefer nuanced replies, while gender effects are minimal. We highlight the limitations of automatic metrics and the need for context- and user-aware system design. Our findings, along with the planned release of TIDE, provide a foundation for building safe, resource-efficient, and ethically sound empathetic AI to supplement, not replace, clinical mental health care.

Related papers

Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models [92.93521294357058]
Narrative therapy helps individuals transform problematic life stories into empowering alternatives.<n>Current approaches lack realism in specialized psychotherapy and fail to capture therapeutic progression over time.<n>Int (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses.
arXiv Detail & Related papers (2025-07-27T11:52:09Z)
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z)
Modeling Challenging Patient Interactions: LLMs for Medical Communication Training [39.67477471073807]
This study proposes the use of Large Language Models (LLMs) to simulate authentic patient communication styles.<n>We developed virtual patients (VPs) that embody nuanced emotional and conversational traits.<n>Medical professionals evaluated these VPs, rating their authenticity (accuser: $3.8 pm 1.0$; rationalizer: $3.7 pm 0.8$ on a 5-point Likert scale (from one to five)) and correctly identifying their styles.
arXiv Detail & Related papers (2025-03-28T09:04:10Z)
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.<n>Privacy concerns limit the accessibility of personalized treatment data.<n>MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z)
Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha) Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z)
Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning [6.400704401007114]
This study investigates the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality. When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly.
arXiv Detail & Related papers (2024-01-29T08:53:41Z)
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation [59.8935454665427]
Empathetic dialogue models usually consider only the affective aspect or treat cognition and affection in isolation. We propose the CASE model for empathetic dialogue generation.
arXiv Detail & Related papers (2022-08-18T14:28:38Z)
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner. We propose a method based on a transformer pretrained language model (T5) We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z)
World Trade Center responders in their own words: Predicting PTSD symptom trajectories with AI-based language analyses of interviews [6.700088567524812]
This study tested the ability of AI-based language assessments to predict PTSD symptom trajectories among responders. Cross-sectionally, greater depressive language (beta=0.32; p43) and first-person singular usage (beta=0.31; p44) were associated with increased symptom severity. Longer words usage (beta=-0.36; p7) and longer words usage (beta=-0.36; p7) predicted improvement.
arXiv Detail & Related papers (2020-11-12T15:57:23Z)
LAXARY: A Trustworthy Explainable Twitter Analysis Model for Post-Traumatic Stress Disorder Assessment [1.776746672434207]
We propose LAXARY (Linguistic Analysis-based Exaplainable Inquiry) model to detect and represent PTSD assessment of twitter users. First, we employ clinically validated survey tools for collecting clinical PTSD assessment data from real twitter users. Then, we use the PTSD Linguistic Dictionary along with machine learning model to fill up the survey tools towards detecting PTSD status and its intensity of corresponding twitter users.
arXiv Detail & Related papers (2020-03-16T20:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.