AI chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care
- URL: http://arxiv.org/abs/2602.05628v1
- Date: Thu, 05 Feb 2026 13:09:19 GMT
- Title: AI chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care
- Authors: Alastair Howcroft, Amber Bennett-Weston, Ahmad Khan, Joseff Griffiths, Simon Gay, Jeremy Howick,
- Abstract summary: Use of artificial intelligence (AI)-based chatbots in healthcare is rapidly expanding.<n>Some studies suggest AI chatbots can outperform human healthcare professionals (HCPs) in empathy.<n>We searched multiple databases for studies comparing AI chatbots with human HCPs on empathy measures.<n>Thirteen studies reported statistically significantly higher empathy ratings for AI, with only two studies situated in dermatology favouring human responses.
- Score: 0.10262304700896197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Empathy is widely recognized for improving patient outcomes, including reduced pain and anxiety and improved satisfaction, and its absence can cause harm. Meanwhile, use of artificial intelligence (AI)-based chatbots in healthcare is rapidly expanding, with one in five general practitioners using generative AI to assist with tasks such as writing letters. Some studies suggest AI chatbots can outperform human healthcare professionals (HCPs) in empathy, though findings are mixed and lack synthesis. Sources of data: We searched multiple databases for studies comparing AI chatbots using large language models with human HCPs on empathy measures. We assessed risk of bias with ROBINS-I and synthesized findings using random-effects meta-analysis where feasible, whilst avoiding double counting. Areas of agreement: We identified 15 studies (2023-2024). Thirteen studies reported statistically significantly higher empathy ratings for AI, with only two studies situated in dermatology favouring human responses. Of the 15 studies, 13 provided extractable data and were suitable for pooling. Meta-analysis of those 13 studies, all utilising ChatGPT-3.5/4, showed a standardized mean difference of 0.87 (95% CI, 0.54-1.20) favouring AI (P < .00001), roughly equivalent to a two-point increase on a 10-point scale. Areas of controversy: Studies relied on text-based assessments that overlook non-verbal cues and evaluated empathy through proxy raters. Growing points: Our findings indicate that, in text-only scenarios, AI chatbots are frequently perceived as more empathic than human HCPs. Areas timely for developing research: Future research should validate these findings with direct patient evaluations and assess whether emerging voice-enabled AI systems can deliver similar empathic advantages.
Related papers
- Design and Validation of a Responsible Artificial Intelligence-based System for the Referral of Diabetic Retinopathy Patients [65.57160385098935]
Early detection of Diabetic Retinopathy can reduce the risk of vision loss by up to 95%.<n>We developed RAIS-DR, a Responsible AI System for DR screening that incorporates ethical principles across the AI lifecycle.<n>We evaluated RAIS-DR against the FDA-approved EyeArt system on a local dataset of 1,046 patients, unseen by both systems.
arXiv Detail & Related papers (2025-08-17T21:54:11Z) - Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis [0.9898534984111934]
We developed an extraction platform using large language models (LLMs) to automate data extraction.<n>We compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review.<n>Findings suggest AI variability depends more on interpretability than hallucination.
arXiv Detail & Related papers (2025-08-13T03:33:30Z) - Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial [5.429833789548265]
Large language model (LLM) chatbots show increasing promise in persuasive communication.<n>We enrolled 915 U.S. adults (ages 45-75) who had never completed colorectal cancer (CRC) screening.<n>Both AI interventions significantly increased stool test intentions by over 12 points (12.9-13.8/100), compared to a 7.5 gain for expert materials.
arXiv Detail & Related papers (2025-07-10T22:46:43Z) - Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.<n>We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z) - Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions [9.327472312657392]
The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support.
This study investigates the question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians?
We collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT.
arXiv Detail & Related papers (2024-05-26T01:58:57Z) - How Reliable AI Chatbots are for Disease Prediction from Patient Complaints? [0.0]
This study examines the reliability of AI chatbots, specifically GPT 4.0, Claude 3 Opus, and Gemini Ultra 1.0, in predicting diseases from patient complaints in the emergency department.
Results suggest that GPT 4.0 achieves high accuracy with increased few-shot data, while Gemini Ultra 1.0 performs well with fewer examples, and Claude 3 Opus maintains consistent performance.
arXiv Detail & Related papers (2024-05-21T22:00:13Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z) - Mitigating Toxic Degeneration with Empathetic Data: Exploring the
Relationship Between Toxicity and Empathy [8.293498506807333]
Using empathetic data, we improve over recent work on controllable text generation that aims to reduce the toxicity of generated text.
We find we are able to dramatically reduce the size of fine-tuning data to 7.5-30k samples while at the same time making significant improvements over state-of-the-art toxicity mitigation.
arXiv Detail & Related papers (2022-05-15T09:37:15Z) - EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner.
We propose a method based on a transformer pretrained language model (T5)
We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.