Related papers: On the Limits of Selective AI Prediction: A Case Study in Clinical Decision Making

On the Limits of Selective AI Prediction: A Case Study in Clinical Decision Making

URL: http://arxiv.org/abs/2508.07617v1
Date: Mon, 11 Aug 2025 04:53:13 GMT
Title: On the Limits of Selective AI Prediction: A Case Study in Clinical Decision Making
Authors: Sarah Jabbour, David Fouhey, Nikola Banovic, Stephanie D. Shepard, Ella Kazerooni, Michael W. Sjoding, Jenna Wiens,
Abstract summary: We study the effects of selective prediction on human decisions in a clinical context.<n>Our findings indicate that selective prediction mitigates the negative effects of inaccurate AI in terms of decision accuracy.
Score: 13.982768346782386
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI has the potential to augment human decision making. However, even high-performing models can produce inaccurate predictions when deployed. These inaccuracies, combined with automation bias, where humans overrely on AI predictions, can result in worse decisions. Selective prediction, in which potentially unreliable model predictions are hidden from users, has been proposed as a solution. This approach assumes that when AI abstains and informs the user so, humans make decisions as they would without AI involvement. To test this assumption, we study the effects of selective prediction on human decisions in a clinical context. We conducted a user study of 259 clinicians tasked with diagnosing and treating hospitalized patients. We compared their baseline performance without any AI involvement to their AI-assisted accuracy with and without selective prediction. Our findings indicate that selective prediction mitigates the negative effects of inaccurate AI in terms of decision accuracy. Compared to no AI assistance, clinician accuracy declined when shown inaccurate AI predictions (66% [95% CI: 56%-75%] vs. 56% [95% CI: 46%-66%]), but recovered under selective prediction (64% [95% CI: 54%-73%]). However, while selective prediction nearly maintains overall accuracy, our results suggest that it alters patterns of mistakes: when informed the AI abstains, clinicians underdiagnose (18% increase in missed diagnoses) and undertreat (35% increase in missed treatments) compared to no AI input at all. Our findings underscore the importance of empirically validating assumptions about how humans engage with AI within human-AI systems.

Related papers

Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public [46.86429592892395]
explainable AI (XAI) addresses this by providing AI decision-making insight.<n>We present results from two large-scale experiments combining a fairness-based diagnosis AI model and different XAI explanations.
arXiv Detail & Related papers (2025-12-14T00:06:06Z)
Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care [2.4339626079536925]
The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis.<n>Despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside.<n>This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting.
arXiv Detail & Related papers (2025-07-02T01:43:06Z)
Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis.<n>We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis.<n>We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z)
AI-Assisted Decision Making with Human Learning [8.598431584462944]
In many cases, despite the algorithm's superior performance, the final decision remains in human hands.<n>This paper studies such AI-assisted decision-making settings, where the human learns through repeated interactions with the algorithm.<n>We observe that the discrepancy between the algorithm's model and the human's model creates a fundamental tradeoff.
arXiv Detail & Related papers (2025-02-18T17:08:21Z)
Human-Alignment Influences the Utility of AI-assisted Decision Making [16.732483972136418]
We investigate what extent the degree of alignment actually influences the utility of AI-assisted decision making.<n>Our results show a positive association between the degree of alignment and the utility of AI-assisted decision making.
arXiv Detail & Related papers (2025-01-23T19:01:47Z)
Using AI Uncertainty Quantification to Improve Human Decision-Making [14.878886078377562]
AI Uncertainty Quantification (UQ) has the potential to improve human decision-making beyond AI predictions alone. We evaluated the impact on human decision-making for instance-level UQ, using a strict scoring rule, in two online behavioral experiments.
arXiv Detail & Related papers (2023-09-19T18:01:25Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z)
Does Explainable Artificial Intelligence Improve Human Decision-Making? [17.18994675838646]
We compare and evaluate objective human decision accuracy without AI (control), with an AI prediction (no explanation) and AI prediction with explanation. We find any kind of AI prediction tends to improve user decision accuracy, but no conclusive evidence that explainable AI has a meaningful impact. Our results indicate that, at least in some situations, the "why" information provided in explainable AI may not enhance user decision-making.
arXiv Detail & Related papers (2020-06-19T15:46:13Z)
Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance. We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves. Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z)
Artificial Artificial Intelligence: Measuring Influence of AI 'Assessments' on Moral Decision-Making [48.66982301902923]
We examined the effect of feedback from false AI on moral decision-making about donor kidney allocation. We found some evidence that judgments about whether a patient should receive a kidney can be influenced by feedback about participants' own decision-making perceived to be given by AI.
arXiv Detail & Related papers (2020-01-13T14:15:18Z)
Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.