Related papers: Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models

Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models

URL: http://arxiv.org/abs/2602.11360v1
Date: Wed, 11 Feb 2026 20:47:30 GMT
Title: Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models
Authors: Sara Matijevic, Christopher Yau,
Abstract summary: We propose a novel bootstrapping-based regularisation framework that embeds the bootstrapping process directly into the training of deep neural networks.<n>This approach constrains prediction variability across resampled datasets, producing a single model with inherent stability properties.<n>We evaluated models constructed using the proposed regularisation approach against conventional and ensemble models.
Score: 2.1127261244588156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical prediction models are increasingly used to support patient care, yet many deep learning-based approaches remain unstable, as their predictions can vary substantially when trained on different samples from the same population. Such instability undermines reliability and limits clinical adoption. In this study, we propose a novel bootstrapping-based regularisation framework that embeds the bootstrapping process directly into the training of deep neural networks. This approach constrains prediction variability across resampled datasets, producing a single model with inherent stability properties. We evaluated models constructed using the proposed regularisation approach against conventional and ensemble models using simulated data and three clinical datasets: GUSTO-I, Framingham, and SUPPORT. Across all datasets, our model exhibited improved prediction stability, with lower mean absolute differences (e.g., 0.019 vs. 0.059 in GUSTO-I; 0.057 vs. 0.088 in Framingham) and markedly fewer significantly deviating predictions. Importantly, discriminative performance and feature importance consistency were maintained, with high SHAP correlations between models (e.g., 0.894 for GUSTO-I; 0.965 for Framingham). While ensemble models achieved greater stability, we show that this came at the expense of interpretability, as each constituent model used predictors in different ways. By regularising predictions to align with bootstrapped distributions, our approach allows prediction models to be developed that achieve greater robustness and reproducibility without sacrificing interpretability. This method provides a practical route toward more reliable and clinically trustworthy deep learning models, particularly valuable in data-limited healthcare settings.

Related papers

Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare [0.0]
We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics.<n>We apply these diagnostics to simulated data and GUSTO-I clinical dataset.
arXiv Detail & Related papers (2026-02-27T03:42:28Z)
Latent Neural-ODE for Model-Informed Precision Dosing: Overcoming Structural Assumptions in Pharmacokinetics [3.0991186209192794]
We introduce a novel data-driven alternative based on Latent Ordinary Differential Equations (Latent ODEs) for tacrolimus AUC prediction.<n>This deep learning approach learns individualized dynamics directly from sparse clinical data.<n>Latent ODE model demonstrated superior robustness, maintaining high accuracy even when underlying biological mechanisms deviated from standard assumptions.
arXiv Detail & Related papers (2026-02-03T07:30:48Z)
Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data [18.017666750186336]
SASL is an interpretable modeling approach explicitly designed for personalized health prediction.<n>We develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets.<n>For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating.
arXiv Detail & Related papers (2025-10-03T09:17:57Z)
On Arbitrary Predictions from Equally Valid Models [49.56463611078044]
Model multiplicity refers to multiple machine learning models that admit conflicting predictions for the same patient.<n>We show that even small ensembles can mitigate/eliminate predictive multiplicity in practice.
arXiv Detail & Related papers (2025-07-25T16:15:59Z)
Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy [0.9999629695552196]
The present work develops and validates a data-driven and interpretable machine-learning framework designed to predict strokes.<n>Ten routinely gathered demographic, lifestyle, and clinical variables were sourced from a public cohort of 4,981 records.<n>The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model.
arXiv Detail & Related papers (2025-05-18T21:46:45Z)
Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z)
An Epistemic and Aleatoric Decomposition of Arbitrariness to Constrain the Set of Good Models [7.620967781722717]
Recent research reveals that machine learning (ML) models are highly sensitive to minor changes in their training procedure.<n>We show that stability decomposes into epistemic and aleatoric components, capturing the consistency and confidence in prediction.<n>We propose a model selection procedure that includes epistemic and aleatoric criteria alongside existing accuracy and fairness criteria, and show that it successfully narrows down a large set of good models.
arXiv Detail & Related papers (2023-02-09T09:35:36Z)
When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z)
Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research. Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.