Related papers: The Persona Paradox: Medical Personas as Behavioral Priors in Clinical Language Models

The Persona Paradox: Medical Personas as Behavioral Priors in Clinical Language Models

URL: http://arxiv.org/abs/2601.05376v1
Date: Thu, 08 Jan 2026 21:01:11 GMT
Title: The Persona Paradox: Medical Personas as Behavioral Priors in Clinical Language Models
Authors: Tassallah Abdullahi, Shrestha Ghosh, Hamish S Fraser, Daniel León Tramontini, Adeel Abbasi, Ghada Bourjeily, Carsten Eickhoff, Ritambhara Singh,
Abstract summary: Personas function as behavioral priors that introduce context-dependent trade-offs rather than guarantees of safety or expertise.<n>Our work shows that personas function as behavioral priors that introduce context-dependent trade-offs rather than guarantees of safety or expertise.
Score: 18.902372087770562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Persona conditioning can be viewed as a behavioral prior for large language models (LLMs) and is often assumed to confer expertise and improve safety in a monotonic manner. However, its effects on high-stakes clinical decision-making remain poorly characterized. We systematically evaluate persona-based control in clinical LLMs, examining how professional roles (e.g., Emergency Department physician, nurse) and interaction styles (bold vs.\ cautious) influence behavior across models and medical tasks. We assess performance on clinical triage and patient-safety tasks using multidimensional evaluations that capture task accuracy, calibration, and safety-relevant risk behavior. We find systematic, context-dependent, and non-monotonic effects: Medical personas improve performance in critical care tasks, yielding gains of up to $\sim+20\%$ in accuracy and calibration, but degrade performance in primary-care settings by comparable margins. Interaction style modulates risk propensity and sensitivity, but it's highly model-dependent. While aggregated LLM-judge rankings favor medical over non-medical personas in safety-critical cases, we found that human clinicians show moderate agreement on safety compliance (average Cohen's $κ= 0.43$) but indicate a low confidence in 95.9\% of their responses on reasoning quality. Our work shows that personas function as behavioral priors that introduce context-dependent trade-offs rather than guarantees of safety or expertise. The code is available at https://github.com/rsinghlab/Persona\_Paradox.

Related papers

Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice [0.1609950046042424]
We propose a risk-sensitive evaluation framework that quantifies hallucinations through the presence of risk-bearing language.<n>We apply this framework to three instruction-tuned language models using controlled patient-facing prompts designed as safety stress tests.
arXiv Detail & Related papers (2026-02-07T02:25:44Z)
Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios [9.865786198063644]
The transition of Large Language Models (LLMs) from passive knowledge retrievers to autonomous clinical agents demands a shift in evaluation-from static accuracy to dynamic behavioral reliability.<n>This study empirically charts the capability boundaries of dental LLMs, providing a roadmap for bridging the gap between standardized knowledge and safe, autonomous clinical practice.
arXiv Detail & Related papers (2026-01-19T11:36:39Z)
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment [9.422745886489801]
Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment.<n>We present an iterative post-deployment alignment framework that applies Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) to refine models against domain-specific safety signals.
arXiv Detail & Related papers (2025-12-03T19:30:07Z)
DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services [49.70819009392778]
Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers.<n>This study aimed to develop and evaluate a taxonomy-grounded, multi-agent system for simulating realistic scenarios.
arXiv Detail & Related papers (2025-10-24T08:01:21Z)
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models [3.9481669393262675]
We investigate how psychometric personality control grounded in the Big Five framework influences AI behavior in the context of capability and safety benchmarks.<n>Our experiments reveal striking effects: for example, reducing conscientiousness leads to significant drops in safety-relevant metrics on benchmarks such as WMDP, TruthfulQA, ETHICS, and Sycophancy.<n>These findings highlight personality shaping as a powerful and underexplored axis of model control that interacts with both safety and general competence.
arXiv Detail & Related papers (2025-09-19T18:19:56Z)
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z)
Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings [48.096652370210016]
We introduce a safety evaluation protocol tailored to the medical domain in both patient user and clinician user perspectives.<n>This is the first work to define safety evaluation criteria for medical LLMs through targeted red-teaming taking three different points of view.
arXiv Detail & Related papers (2025-07-09T19:38:58Z)
DeVisE: Behavioral Testing of Medical Large Language Models [14.832083455439749]
DeVisE is a behavioral testing framework for probing fine-grained clinical understanding.<n>We construct a dataset of ICU discharge notes from MIMIC-IV.<n>We evaluate five LLMs spanning general-purpose and medically fine-tuned variants.
arXiv Detail & Related papers (2025-06-18T10:42:22Z)
Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.<n>We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.<n>We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z)
Operationalizing Counterfactual Metrics: Incentives, Ranking, and Information Asymmetry [62.53919624802853]
We analyze the incentive misalignments that arise from such average treated outcome metrics. We show how counterfactual metrics can be modified to behave reasonably in patient-facing ranking systems.
arXiv Detail & Related papers (2023-05-24T00:24:38Z)
KIDS: kinematics-based (in)activity detection and segmentation in a sleep case study [5.707737640557724]
Sleep behaviour and in-bed movements contain rich information on the neurophysiological health of people. This paper proposes an online Bayesian probabilistic framework for objective (in)activity detection and segmentation based on clinically meaningful joint kinematics.
arXiv Detail & Related papers (2023-01-04T16:24:01Z)
Interpretability by design using computer vision for behavioral sensing in child and adolescent psychiatry [3.975358343371988]
We use machine learning to derive behavioral codes or concepts of a gold standard behavioral rating system. Our ratings were comparable to human expert ratings for negative emotions, activity-level/arousal and anxiety.
arXiv Detail & Related papers (2022-07-11T09:07:08Z)
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.