Related papers: When accurate prediction models yield harmful self-fulfilling prophecies

When accurate prediction models yield harmful self-fulfilling prophecies

URL: http://arxiv.org/abs/2312.01210v3
Date: Thu, 8 Feb 2024 10:21:04 GMT
Title: When accurate prediction models yield harmful self-fulfilling prophecies
Authors: Wouter A.C. van Amsterdam, Nan van Geloven, Jesse H. Krijthe, Rajesh Ranganath, Giovanni Cin\'a
Abstract summary: We show that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. Our results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions.
Score: 17.49185224494467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Objective: Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. Many prediction models are deployed for decision support based on their prediction accuracy in validation studies. We investigate whether this is a safe and valid approach. Materials and Methods: We show that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Results: Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before and after deployment are useless for decision making as they made no change in the data distribution. Discussion: Our results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions. Conclusion: Outcome prediction models can yield harmful self-fulfilling prophecies when used for decision making, a new perspective on prediction model development, deployment and monitoring is needed.

Related papers

On Arbitrary Predictions from Equally Valid Models [49.56463611078044]
Model multiplicity refers to multiple machine learning models that admit conflicting predictions for the same patient.<n>We show that even small ensembles can mitigate/eliminate predictive multiplicity in practice.
arXiv Detail & Related papers (2025-07-25T16:15:59Z)
A causal viewpoint on prediction model performance under changes in case-mix: discrimination and calibration respond differently for prognosis and diagnosis predictions [0.9065034043031668]
This work introduces a novel framework that differentiates the effects of case-mix shifts on discrimination and calibration based on the causal direction of the prediction task. A simulation study and empirical validation using cardiovascular disease prediction models demonstrate the implications of this framework.
arXiv Detail & Related papers (2024-09-02T20:00:45Z)
Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models. We present theoretical results on the expected churn between models within the Rashomon set. We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
From algorithms to action: improving patient care requires causality [18.154976419582873]
Most outcome prediction models are developed and validated without regard to the causal aspects of treatment decision making. Guidelines on prediction model validation and the checklist for risk model endorsement by the American Joint Committee on Cancer do not protect against prediction models that are accurate during development and validation but harmful when used for decision making.
arXiv Detail & Related papers (2022-09-15T15:57:17Z)
A Machine Learning Model for Predicting, Diagnosing, and Mitigating Health Disparities in Hospital Readmission [0.0]
We propose a machine learning pipeline capable of making predictions as well as detecting and mitigating biases in the data and model predictions. We evaluate the performance of the proposed method on a clinical dataset using accuracy and fairness measures.
arXiv Detail & Related papers (2022-06-13T16:07:25Z)
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z)
Loss Estimators Improve Model Generalization [36.520569284970456]
We propose to train a loss estimator alongside the predictive model, using a contrastive training objective, to directly estimate the prediction uncertainties. We show the impact of loss estimators on model generalization, in terms of both its fidelity on in-distribution data and its ability to detect out of distribution samples or new classes unseen during training.
arXiv Detail & Related papers (2021-03-05T16:35:10Z)
Learning to Predict with Supporting Evidence: Applications to Clinical Risk Prediction [9.199022926064009]
The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. We present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted.
arXiv Detail & Related papers (2021-03-04T00:26:32Z)
When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making [68.19284302320146]
We carry out user studies to assess how people with differing levels of expertise respond to different types of predictive uncertainty. We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions. This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
arXiv Detail & Related papers (2020-11-12T02:23:53Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Impact of Medical Data Imprecision on Learning Results [9.379890125442333]
We study the impact of imprecision on prediction results in a healthcare application. A pre-trained model is used to predict future state of hyperthyroidism for patients.
arXiv Detail & Related papers (2020-07-24T06:54:57Z)
Counterfactual Predictions under Runtime Confounding [74.90756694584839]
We study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data. We propose a doubly-robust procedure for learning counterfactual prediction models in this setting.
arXiv Detail & Related papers (2020-06-30T15:49:05Z)
Performance metrics for intervention-triggering prediction models do not reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models. Standard metrics calculated from retrospective data are only related to model utility under certain assumptions. When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.