Related papers: Limitations of ROC on Imbalanced Data: Evaluation of LVAD Mortality Risk Scores

Limitations of ROC on Imbalanced Data: Evaluation of LVAD Mortality Risk Scores

URL: http://arxiv.org/abs/2010.16253v1
Date: Thu, 29 Oct 2020 11:10:15 GMT
Title: Limitations of ROC on Imbalanced Data: Evaluation of LVAD Mortality Risk Scores
Authors: Faezeh Movahedi, Rema Padman, James F. Antaki
Abstract summary: The receiver operating characteristic (ROC) is a commonly applied metric of performance of classifiers. ROC can provide a distorted view of classifiers ability to predict short-term mortality due to the overwhelmingly greater proportion of patients who survive. This study compared the ROC and the precision recall curve (PRC) for the outcome of two classifiers for 90-day LVAD mortality.
Score: 2.578242050187029
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Objective: This study illustrates the ambiguity of ROC in evaluating two classifiers of 90-day LVAD mortality. This paper also introduces the precision recall curve (PRC) as a supplemental metric that is more representative of LVAD classifiers performance in predicting the minority class. Background: In the LVAD domain, the receiver operating characteristic (ROC) is a commonly applied metric of performance of classifiers. However, ROC can provide a distorted view of classifiers ability to predict short-term mortality due to the overwhelmingly greater proportion of patients who survive, i.e. imbalanced data. Methods: This study compared the ROC and PRC for the outcome of two classifiers for 90-day LVAD mortality for 800 patients (test group) recorded in INTERMACS who received a continuous-flow LVAD between 2006 and 2016 (mean age of 59 years; 146 females vs. 654 males) in which mortality rate is only %8 at 90-day (imbalanced data). The two classifiers were HeartMate Risk Score (HMRS) and a Random Forest (RF). Results: The ROC indicates fairly good performance of RF and HRMS classifiers with Area Under Curves (AUC) of 0.77 vs. 0.63, respectively. This is in contrast with their PRC with AUC of 0.43 vs. 0.16 for RF and HRMS, respectively. The PRC for HRMS showed the precision rapidly dropped to only 10% with slightly increasing sensitivity. Conclusion: The ROC can portray an overly-optimistic performance of a classifier or risk score when applied to imbalanced data. The PRC provides better insight about the performance of a classifier by focusing on the minority class.

Related papers

Externally Validated Longitudinal GRU Model for Visit-Level 180-Day Mortality Risk in Metastatic Castration-Resistant Prostate Cancer [0.5361389213879222]
Metastatic castration-resistant prostate cancer (mCRPC) is a highly aggressive disease with poor prognosis and heterogeneous treatment response.<n>We developed and validated a visit-level 180-day mortality risk model using longitudinal data from two Phase III cohorts.
arXiv Detail & Related papers (2026-01-27T20:48:53Z)
Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
Generalizable Diabetes Risk Stratification via Hybrid Machine Learning Models [0.0]
Diabetes affects over 537 million people worldwide and is projected to reach 783 million by 2045.<n>We compare two hybrid classifiers and assess their generalizability on an external cohort.
arXiv Detail & Related papers (2025-09-24T21:18:52Z)
Deep Active Learning for Lung Disease Severity Classification from Chest X-rays: Learning with Less Data in the Presence of Class Imbalance [0.0]
This study collected 2,319 chest X-rays from 963 patients at Emory Healthcare affiliated hospitals between January and November 2020.<n>A deep neural network with Monte Carlo Dropout was trained using active learning to classify disease severity.<n>Deep active learning with BNN approximation and weighted loss effectively reduces labeled data requirements.
arXiv Detail & Related papers (2025-08-28T23:29:56Z)
A SHAP-based explainable multi-level stacking ensemble learning method for predicting the length of stay in acute stroke [3.2906073576204955]
Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors.<n>We developed an interpretable multi-level stacking ensemble model for ischaemic and haemorrhagic stroke.<n>An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke.<n>Further validation is needed for haemorrhagic stroke.
arXiv Detail & Related papers (2025-05-30T01:08:26Z)
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z)
A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study [5.831730826863567]
We developed and validated TRisk, a Transformer-based AI model predicting 36-month mortality in heart failure patients. Our study included 403,534 heart failure patients (ages 40-90) from 1,418 English general practices.
arXiv Detail & Related papers (2025-03-16T01:53:50Z)
Deep Learning Models to Automate the Scoring of Hand Radiographs for Rheumatoid Arthritis [0.0]
The Sharp (SvdH) score is a widely used radiographic scoring method to quantify damage in Rheumatoid Arthritis (RA) in clinical trials. We developed a bespoke, automated pipeline that is capable of predicting the SvdH score and RA severity from hand radiographs without the need to localise the joints first.
arXiv Detail & Related papers (2024-06-14T12:43:16Z)
Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray [86.38767955626179]
Deep-learning algorithm to predict coronary artery calcium (CAC) score was developed on 460 chest x-ray. The diagnostic accuracy of the AICAC model assessed by the area under the curve (AUC) was the primary outcome.
arXiv Detail & Related papers (2024-03-27T16:56:14Z)
Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects [27.786240241494436]
COnfidence-Based chaRacterization of Anomalies (COBRA) score exploits the decrease in confidence of these models when presented with impaired or diseased patients. We applied the COBRA score to address a key limitation of current clinical evaluation of upper-body impairment in stroke patients.
arXiv Detail & Related papers (2023-11-21T18:45:52Z)
A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images [10.860934781772098]
The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC. The incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making.
arXiv Detail & Related papers (2023-11-01T15:07:39Z)
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency. ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData. ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging [67.02500668641831]
Deep learning models trained on noisy datasets are sensitive to the noise type and lead to less generalization on unseen samples. We propose a Robust Knowledge Distillation (RoS-KD) framework which mimics the notion of learning a topic from multiple sources to ensure deterrence in learning noisy information. RoS-KD learns a smooth, well-informed, and robust student manifold by distilling knowledge from multiple teachers trained on overlapping subsets of training data.
arXiv Detail & Related papers (2022-10-15T22:32:20Z)
BIO-CXRNET: A Robust Multimodal Stacking Machine Learning Technique for Mortality Risk Prediction of COVID-19 Patients using Chest X-Ray Images and Clinical Data [0.0]
This study uses 25 biomarkers and CXR images in predicting the risk in 930 COVID-19 patients admitted in Italy. The proposed multimodal stacking technique produced the precision, sensitivity, and F1-score, of 89.03%, 90.44%, and 89.03%, respectively. The nomogram-based scoring technique was able to predict the death probability of high-risk patients with an F1 score of 92.88 %.
arXiv Detail & Related papers (2022-06-15T15:23:43Z)
Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation [22.45448597986172]
Multilabel gradient boosting machine provided the best discriminant power for stroke, major bleeding, and death. Models identified additional risk features (such as hemoglobin level, renal function, etc.) for each outcome.
arXiv Detail & Related papers (2022-02-02T15:15:03Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.