Mapping Patient-Perceived Physician Traits from Nationwide Online Reviews with LLMs
- URL: http://arxiv.org/abs/2510.03997v1
- Date: Sun, 05 Oct 2025 02:16:35 GMT
- Title: Mapping Patient-Perceived Physician Traits from Nationwide Online Reviews with LLMs
- Authors: Junjie Luo, Rui Han, Arshana Welivita, Zeleikun Di, Jingfu Wu, Xuzhe Zhi, Ritu Agarwal, Gordon Gao,
- Abstract summary: We present a large language model (LLM)-based pipeline that infers Big Five personality traits and five patient subjective judgments.<n>The analysis encompasses 4.1 million patient reviews of 226,999 U.S. physicians from an initial pool of one million.
- Score: 3.364244912862208
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding how patients perceive their physicians is essential to improving trust, communication, and satisfaction. We present a large language model (LLM)-based pipeline that infers Big Five personality traits and five patient-oriented subjective judgments. The analysis encompasses 4.1 million patient reviews of 226,999 U.S. physicians from an initial pool of one million. We validate the method through multi-model comparison and human expert benchmarking, achieving strong agreement between human and LLM assessments (correlation coefficients 0.72-0.89) and external validity through correlations with patient satisfaction (r = 0.41-0.81, all p<0.001). National-scale analysis reveals systematic patterns: male physicians receive higher ratings across all traits, with largest disparities in clinical competence perceptions; empathy-related traits predominate in pediatrics and psychiatry; and all traits positively predict overall satisfaction. Cluster analysis identifies four distinct physician archetypes, from "Well-Rounded Excellent" (33.8%, uniformly high traits) to "Underperforming" (22.6%, consistently low). These findings demonstrate that automated trait extraction from patient narratives can provide interpretable, validated metrics for understanding physician-patient relationships at scale, with implications for quality measurement, bias detection, and workforce development in healthcare.
Related papers
- Advancing AI Trustworthiness Through Patient Simulation: Risk Assessment of Conversational Agents for Antidepressant Selection [1.90974530523188]
Patient simulator generates realistic, controllable patient interactions that vary across medical, linguistic, and behavioral dimensions.<n>Simulator allows annotators and an independent AI judge to assess agent performance, identify hallucinations and inaccuracies, and characterize risk patterns across diverse patient populations.
arXiv Detail & Related papers (2026-02-11T21:53:18Z) - A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care [5.167350493769989]
This is the first evaluation of an LLM-based medication safety review system on real NHS primary care data.<n>We strategically sampled patients to capture a broad range of clinical complexity and medication safety risk.<n>Our primary LLM system showed strong performance in recognising when a clinical issue is present.
arXiv Detail & Related papers (2025-12-24T11:58:49Z) - Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes [55.310195121276074]
We propose a Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework to predict diseases.<n>It integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy.<n>It provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
arXiv Detail & Related papers (2025-12-09T05:37:54Z) - DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services [49.70819009392778]
Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers.<n>This study aimed to develop and evaluate a taxonomy-grounded, multi-agent system for simulating realistic scenarios.
arXiv Detail & Related papers (2025-10-24T08:01:21Z) - The Dialogue That Heals: A Comprehensive Evaluation of Doctor Agents' Inquiry Capability [15.649293541650811]
We present MAQuE(Medical Agent Questioning Evaluation), the largest-ever benchmark for the automatic and comprehensive evaluation of medical multi-turn questioning.<n>It features 3,000 realistically simulated patient agents that exhibit diverse linguistic patterns, cognitive limitations, emotional responses, and tendencies for passive disclosure.<n>We also introduce a multi-faceted evaluation framework, covering task success, inquiry proficiency, dialogue competence, inquiry efficiency, and patient experience.
arXiv Detail & Related papers (2025-09-29T15:52:36Z) - Sense of Self and Time in Borderline Personality. A Comparative Robustness Study with Generative AI [0.0]
This study examines the capacity of large language models (LLMs) to support qualitative analysis of first-person experience in Borderline Personality Disorder (BPD)<n>Three LLMs were compared to mimic the interpretative style of the original investigators.<n>Results showed variable overlap with the human analysis, from 0 percent in GPT to 42 percent in Claude and 58 percent in Gemini, and a low Jaccard coefficient (0.21-0.28)<n> Gemini's output most closely resembled the human analysis, with validity scores significantly higher than GPT and Claude (p 0.0001), and was judged as human by blinded experts.
arXiv Detail & Related papers (2025-08-26T13:13:47Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration [17.11245701879749]
Generalist Medical AI (GMAI) systems have demonstrated expert-level performance in biomedical perception tasks.<n>Here, we present XMedGPT, a clinician-centric, multi-modal AI assistant that integrates textual and visual interpretability.<n>We validate XMedGPT across four pillars: multi-modal interpretability, uncertainty quantification, and prognostic modeling, and rigorous benchmarking.
arXiv Detail & Related papers (2025-05-11T08:32:01Z) - Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z) - Large Language Models for Patient Comments Multi-Label Classification [3.670008893193884]
This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments.<n> GPT-4 Turbo was leveraged to conduct the classification.<n>Using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with.
arXiv Detail & Related papers (2024-10-31T00:29:52Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - What Do You See in this Patient? Behavioral Testing of Clinical NLP
Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input.
We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.