Mind the Performance Gap: Examining Dataset Shift During Prospective
Validation
- URL: http://arxiv.org/abs/2107.13964v1
- Date: Fri, 23 Jul 2021 14:30:59 GMT
- Title: Mind the Performance Gap: Examining Dataset Shift During Prospective
Validation
- Authors: Erkin \"Otle\c{s}, Jeeheh Oh, Benjamin Li, Michelle Bochinski, Hyeon
Joo, Justin Ortwine, Erica Shenoy, Laraine Washer, Vincent B. Young, Krishna
Rao, Jenna Wiens
- Abstract summary: Patient risk stratification models may perform worse compared to their retrospective performance once integrated into clinical care.
We compare the 2020- 2021 prospective performance of a patient risk stratification model for predicting healthcare-associated infections to a ('19-'20) retrospective validation of the same model.
The resulting performance gap was primarily due to infrastructure shift and not temporal shift.
- Score: 6.232311195907715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Once integrated into clinical care, patient risk stratification models may
perform worse compared to their retrospective performance. To date, it is
widely accepted that performance will degrade over time due to changes in care
processes and patient populations. However, the extent to which this occurs is
poorly understood, in part because few researchers report prospective
validation performance. In this study, we compare the 2020-2021 ('20-'21)
prospective performance of a patient risk stratification model for predicting
healthcare-associated infections to a 2019-2020 ('19-'20) retrospective
validation of the same model. We define the difference in retrospective and
prospective performance as the performance gap. We estimate how i) "temporal
shift", i.e., changes in clinical workflows and patient populations, and ii)
"infrastructure shift", i.e., changes in access, extraction and transformation
of data, both contribute to the performance gap. Applied prospectively to
26,864 hospital encounters during a twelve-month period from July 2020 to June
2021, the model achieved an area under the receiver operating characteristic
curve (AUROC) of 0.767 (95% confidence interval (CI): 0.737, 0.801) and a Brier
score of 0.189 (95% CI: 0.186, 0.191). Prospective performance decreased
slightly compared to '19-'20 retrospective performance, in which the model
achieved an AUROC of 0.778 (95% CI: 0.744, 0.815) and a Brier score of 0.163
(95% CI: 0.161, 0.165). The resulting performance gap was primarily due to
infrastructure shift and not temporal shift. So long as we continue to develop
and validate models using data stored in large research data warehouses, we
must consider differences in how and when data are accessed, measure how these
differences may affect prospective performance, and work to mitigate those
differences.
Related papers
- Equitable Length of Stay Prediction for Patients with Learning Disabilities and Multiple Long-term Conditions Using Machine Learning [1.0064817439176887]
This study analyses hospitalisations of 9,618 patients identified with learning disabilities and long-term conditions for the population of Wales.
We describe the demographic characteristics, prevalence of long-term conditions, medication history, hospital visits, and lifestyle history for our study cohort.
We apply machine learning models to predict the length of hospital stays for this cohort.
arXiv Detail & Related papers (2024-11-03T20:14:20Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - APRICOT-Mamba: Acuity Prediction in Intensive Care Unit (ICU):
Development and Validation of a Stability, Transitions, and Life-Sustaining
Therapies Prediction Model [12.370938858314911]
The acuity state of patients in the intensive care unit (ICU) can quickly change from stable to unstable.
Early detection of deteriorating conditions can result in providing timely interventions and improved survival rates.
We propose APRICOT-M (Acuity Prediction in Intensive Care Unit-Mamba) to predict acuity state, transitions, and the need for life-sustaining therapies in real-time in ICU patients.
arXiv Detail & Related papers (2023-11-03T16:52:27Z) - On the explainability of hospitalization prediction on a large COVID-19
patient dataset [45.82374977939355]
We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients.
Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class.
arXiv Detail & Related papers (2021-10-28T10:23:38Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Continual Deterioration Prediction for Hospitalized COVID-19 Patients [3.3581926090154113]
We develop a temporal stratification approach to make daily predictions on patients' outcome at the end of hospital stay.
Preliminary experiments show 0.98 AUROC, 0.91 F1 score and 0.97 AUPR on continuous deterioration prediction.
arXiv Detail & Related papers (2021-01-19T12:03:56Z) - Individualized Prediction of COVID-19 Adverse outcomes with MLHO [9.197411456718708]
We developed an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health outcomes.
We modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics.
Our results demonstrated that while demographic variables are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model.
arXiv Detail & Related papers (2020-08-10T02:44:52Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z) - Joint Prediction and Time Estimation of COVID-19 Developing Severe
Symptoms using Chest CT Scan [49.209225484926634]
We propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time.
To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification.
Our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.
arXiv Detail & Related papers (2020-05-07T12:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.