Stability of clinical prediction models developed using statistical or
machine learning methods
- URL: http://arxiv.org/abs/2211.01061v1
- Date: Wed, 2 Nov 2022 11:55:28 GMT
- Title: Stability of clinical prediction models developed using statistical or
machine learning methods
- Authors: Richard D Riley and Gary S Collins
- Abstract summary: Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors.
Many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks)
We show instability in a model's estimated risks is often considerable, and manifests itself as miscalibration of predictions in new data.
- Score: 0.5482532589225552
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clinical prediction models estimate an individual's risk of a particular
health outcome, conditional on their values of multiple predictors. A developed
model is a consequence of the development dataset and the chosen model building
strategy, including the sample size, number of predictors and analysis method
(e.g., regression or machine learning). Here, we raise the concern that many
models are developed using small datasets that lead to instability in the model
and its predictions (estimated risks). We define four levels of model stability
in estimated risks moving from the overall mean to the individual level. Then,
through simulation and case studies of statistical and machine learning
approaches, we show instability in a model's estimated risks is often
considerable, and ultimately manifests itself as miscalibration of predictions
in new data. Therefore, we recommend researchers should always examine
instability at the model development stage and propose instability plots and
measures to do so. This entails repeating the model building steps (those used
in the development of the original prediction model) in each of multiple (e.g.,
1000) bootstrap samples, to produce multiple bootstrap models, and then
deriving (i) a prediction instability plot of bootstrap model predictions
(y-axis) versus original model predictions (x-axis), (ii) a calibration
instability plot showing calibration curves for the bootstrap models in the
original sample; and (iii) the instability index, which is the mean absolute
difference between individuals' original and bootstrap model predictions. A
case study is used to illustrate how these instability assessments help
reassure (or not) whether model predictions are likely to be reliable (or not),
whilst also informing a model's critical appraisal (risk of bias rating),
fairness assessment and further validation requirements.
Related papers
- Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - A performance characteristic curve for model evaluation: the application
in information diffusion prediction [3.8711489380602804]
We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model.
Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions.
The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies.
arXiv Detail & Related papers (2023-09-18T07:32:57Z) - Surrogate uncertainty estimation for your time series forecasting black-box: learn when to trust [2.0393477576774752]
Our research introduces a method for uncertainty estimation.
It enhances any base regression model with reasonable uncertainty estimates.
Using various time-series forecasting data, we found that our surrogate model-based technique delivers significantly more accurate confidence intervals.
arXiv Detail & Related papers (2023-02-06T14:52:56Z) - Calibration tests beyond classification [30.616624345970973]
Most supervised machine learning tasks are subject to irreducible prediction errors.
Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets.
Calibrated models guarantee that the predictions are neither over- nor under-confident.
arXiv Detail & Related papers (2022-10-21T09:49:57Z) - Predictive Multiplicity in Probabilistic Classification [25.111463701666864]
We present a framework for measuring predictive multiplicity in probabilistic classification.
We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks.
Our results emphasize the need to report predictive multiplicity more widely.
arXiv Detail & Related papers (2022-06-02T16:25:29Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Causality and Generalizability: Identifiability and Learning Methods [0.0]
This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust prediction methods.
We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization.
We propose a general framework for distributional robustness with respect to intervention-induced distributions.
arXiv Detail & Related papers (2021-10-04T13:12:11Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.