Translating Machine Learning Interpretability into Clinical Insights for ICU Mortality Prediction
- URL: http://arxiv.org/abs/2508.00919v1
- Date: Wed, 30 Jul 2025 02:19:06 GMT
- Title: Translating Machine Learning Interpretability into Clinical Insights for ICU Mortality Prediction
- Authors: Ling Liao, Eva Aagaard,
- Abstract summary: We developed and rigorously evaluated two machine learning models along with interpretation mechanisms.<n>We examined two datasets: one with imputed missing values (130,810 patients, 5.58% ICU mortality) and another excluding patients with missing data (5,661 patients, 23.65% ICU mortality)<n>The random forest (RF) model demonstrated an AUROC of 0.912 with the first dataset and 0.839 with the second dataset, while the XGBoost model achieved an AUROC of 0.924 with the first dataset and 0.834 with the second dataset.
- Score: 0.18416014644193068
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current research efforts largely focus on employing at most one interpretable method to elucidate machine learning (ML) model performance. However, significant barriers remain in translating these interpretability techniques into actionable insights for clinicians, notably due to complexities such as variability across clinical settings and the Rashomon effect. In this study, we developed and rigorously evaluated two ML models along with interpretation mechanisms, utilizing data from 131,051 ICU admissions across 208 hospitals in the United States, sourced from the eICU Collaborative Research Database. We examined two datasets: one with imputed missing values (130,810 patients, 5.58% ICU mortality) and another excluding patients with missing data (5,661 patients, 23.65% ICU mortality). The random forest (RF) model demonstrated an AUROC of 0.912 with the first dataset and 0.839 with the second dataset, while the XGBoost model achieved an AUROC of 0.924 with the first dataset and 0.834 with the second dataset. Consistently identified predictors of ICU mortality across datasets, cross-validation folds, models, and explanation mechanisms included lactate levels, arterial pH, body temperature, and others. By aligning with routinely collected clinical variables, this study aims to enhance ML model interpretability for clinical use, promote greater understanding and adoption among clinicians, and ultimately contribute to improved patient outcomes.
Related papers
- Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning [3.4335475695580127]
Hypertensive kidney disease (HKD) patients in intensive care units (ICUs) face high short-term mortality.<n>We developed a machine learning framework to predict 30-day in-hospital mortality among ICU patients with HKD.
arXiv Detail & Related papers (2025-07-25T00:48:23Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Development of Interactive Nomograms for Predicting Short-Term Survival in ICU Patients with Aplastic Anemia [3.5626691568652507]
Aplastic anemia is a rare, life-threatening hematologic disorder characterized by pancytopenia and bone marrow failure.<n>We used the MIMIC-IV database to identify ICU patients with aplastic anemia and extracted clinical features from five domains.<n> Logistic regression and Cox regression models were constructed to predict 7-, 14-, and 28-day mortality.
arXiv Detail & Related papers (2025-05-23T23:01:11Z) - Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z) - CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE)
CRTRE applies randomize trial design principles to estimate the causal effect of association rules.
We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z) - Optimizing Mortality Prediction for ICU Heart Failure Patients: Leveraging XGBoost and Advanced Machine Learning with the MIMIC-III Database [1.5186937600119894]
Heart failure affects millions of people worldwide, significantly reducing quality of life and leading to high mortality rates.
Despite extensive research, the relationship between heart failure and mortality rates among ICU patients is not fully understood.
This study analyzed data from 1,177 patients over 18 years old from the MIMIC-III database, identified using ICD-9 codes.
arXiv Detail & Related papers (2024-09-03T07:57:08Z) - Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality [0.0]
Sepsis is a severe condition responsible for many deaths in the United States and worldwide.<n>Previous studies employing machine learning faced limitations in feature selection and model interpretability.<n>This research aimed to develop an interpretable and accurate machine learning model to predict in-hospital sepsis mortality.
arXiv Detail & Related papers (2024-08-03T00:28:25Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data.
We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes.
In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - On the explainability of hospitalization prediction on a large COVID-19
patient dataset [45.82374977939355]
We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients.
Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class.
arXiv Detail & Related papers (2021-10-28T10:23:38Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - All Data Inclusive, Deep Learning Models to Predict Critical Events in
the Medical Information Mart for Intensive Care III Database (MIMIC III) [0.0]
This study was performed using 42,818 hospital admissions involving 35,348 patients.
Over 75 million events across multiple data sources were processed, resulting in over 355 million tokens.
It is possible to predict in-hospital mortality with much better confidence and higher reliability from models built using all sources of data.
arXiv Detail & Related papers (2020-09-02T22:12:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.