From Generation to Collaboration: Using LLMs to Edit for Empathy in Healthcare
- URL: http://arxiv.org/abs/2601.15558v1
- Date: Thu, 22 Jan 2026 00:56:33 GMT
- Title: From Generation to Collaboration: Using LLMs to Edit for Empathy in Healthcare
- Authors: Man Luo, Bahareh Harandizadeh, Amara Tariq, Halim Abbas, Umar Ghaffar, Christopher J Warren, Segun O. Kolade, Haidar M. Abdul-Muhsin,
- Abstract summary: This study investigates how large language models (LLMs) can function as empathy editors.<n> Experimental results show that LLM edited responses significantly increase perceived empathy.<n>These findings suggest that using LLMs as editorial assistants, rather than autonomous generators, offers a safer, more effective pathway to empathetic and trustworthy AI-assisted healthcare communication.
- Score: 2.933252902952646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical empathy is essential for patient care, but physicians need continually balance emotional warmth with factual precision under the cognitive and emotional constraints of clinical practice. This study investigates how large language models (LLMs) can function as empathy editors, refining physicians' written responses to enhance empathetic tone while preserving underlying medical information. More importantly, we introduce novel quantitative metrics, an Empathy Ranking Score and a MedFactChecking Score to systematically assess both emotional and factual quality of the responses. Experimental results show that LLM edited responses significantly increase perceived empathy while preserving factual accuracy compared with fully LLM generated outputs. These findings suggest that using LLMs as editorial assistants, rather than autonomous generators, offers a safer, more effective pathway to empathetic and trustworthy AI-assisted healthcare communication.
Related papers
- Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs [12.287636586297756]
The recent proliferation of large language models (LLMs) holds the potential to revolutionize healthcare.<n>This paper investigates the reliability and viability of using medical knowledge graphs (KGs) for the automated factuality evaluation of LLM-generated responses.<n>We introduce FAITH, a framework designed to probe the strengths and limitations of this KG-based approach.
arXiv Detail & Related papers (2025-11-16T22:58:22Z) - The Biased Oracle: Assessing LLMs' Understandability and Empathy in Medical Diagnoses [35.62689455079826]
We evaluate two leading large language models (LLMs) on medical diagnostic scenarios.<n>The results indicate that LLMs adapt explanations to socio-demographic variables and patient conditions.<n>However, they also generate overly complex content and display biased affective empathy, leading to uneven accessibility and support.
arXiv Detail & Related papers (2025-11-02T13:01:07Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Fact or Guesswork? Evaluating Large Language Models' Medical Knowledge with Structured One-Hop Judgments [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their abilities to directly recall and apply factual medical knowledge remains under-explored.<n>We introduce the Medical Knowledge Judgment dataset (MKJ), a dataset derived from the Unified Medical Language System (UMLS), a comprehensive repository of standardized vocabularies and knowledge graphs.<n>Through a binary classification framework, MKJ evaluates LLMs' grasp of fundamental medical facts by having them assess the validity of concise, one-hop statements.
arXiv Detail & Related papers (2025-02-20T05:27:51Z) - Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning [25.068780967617485]
Large language models (LLMs) offer a potential solution but struggle in real-world clinical interactions.<n>We propose Ask Patients with Patience (APP), a multi-turn LLM-based medical assistant designed for grounded reasoning, transparent diagnoses, and human-centric interaction.<n>APP enhances communication by eliciting user symptoms through empathetic dialogue, significantly improving accessibility and user engagement.
arXiv Detail & Related papers (2025-02-11T00:13:52Z) - Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions [9.327472312657392]
The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support.
This study investigates the question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians?
We collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT.
arXiv Detail & Related papers (2024-05-26T01:58:57Z) - Can AI Relate: Testing Large Language Model Response for Mental Health Support [23.97212082563385]
Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS.
We develop an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment.
arXiv Detail & Related papers (2024-05-20T13:42:27Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Don't Ignore Dual Logic Ability of LLMs while Privatizing: A
Data-Intensive Analysis in Medical Domain [19.46334739319516]
We study how the dual logic ability of LLMs is affected during the privatization process in the medical domain.
Our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also improves their accuracy.
arXiv Detail & Related papers (2023-09-08T08:20:46Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.