Enhancing mortality prediction in cardiac arrest ICU patients through meta-modeling of structured clinical data from MIMIC-IV
- URL: http://arxiv.org/abs/2510.18103v1
- Date: Mon, 20 Oct 2025 20:56:45 GMT
- Title: Enhancing mortality prediction in cardiac arrest ICU patients through meta-modeling of structured clinical data from MIMIC-IV
- Authors: Nursultan Mamatov, Philipp Kellmeyer,
- Abstract summary: This study develops and evaluates machine learning models that integrate structured clinical data and unstructured information.<n>We used LASSO and XGBoost for feature selection, followed by a logistic regression trained on the top features identified by both models.<n>The final logistic regression model, which combined structured and textual input, achieved an AUC of 0.918, compared to 0.753 when using structured data alone, a relative improvement 22%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate early prediction of in-hospital mortality in intensive care units (ICUs) is essential for timely clinical intervention and efficient resource allocation. This study develops and evaluates machine learning models that integrate both structured clinical data and unstructured textual information, specifically discharge summaries and radiology reports, from the MIMIC-IV database. We used LASSO and XGBoost for feature selection, followed by a multivariate logistic regression trained on the top features identified by both models. Incorporating textual features using TF-IDF and BERT embeddings significantly improved predictive performance. The final logistic regression model, which combined structured and textual input, achieved an AUC of 0.918, compared to 0.753 when using structured data alone, a relative improvement 22%. The analysis of the decision curve demonstrated a superior standardized net benefit in a wide range of threshold probabilities (0.2-0.8), confirming the clinical utility of the model. These results underscore the added prognostic value of unstructured clinical notes and support their integration into interpretable feature-driven risk prediction models for ICU patients.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z) - Improving Hospital Risk Prediction with Knowledge-Augmented Multimodal EHR Modeling [12.723098379155838]
We introduce a unified framework that seamlessly integrates structured and unstructured data for clinical risk prediction.<n>A fine-tuned Large Language Model (LLM) extracts task-relevant information from clinical notes.<n>The second stage combines both unstructured representations and features derived from the structured data to generate the final predictions.
arXiv Detail & Related papers (2025-08-04T01:03:16Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Early Diagnosis of Atrial Fibrillation Recurrence: A Large Tabular Model Approach with Structured and Unstructured Clinical Data [0.0]
This study aims to predict AF recurrence between one month and two years after onset by evaluating traditional clinical scores, ML models, and our LTM approach.
arXiv Detail & Related papers (2025-05-20T17:31:05Z) - Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries [3.5508427067904864]
In-hospital mortality (IHM) prediction for ICU patients is critical for timely interventions and efficient resource allocation.
This study integrates structured physiological data and clinical notes with Large Language Model (LLM)-generated expert summaries to improve IHM prediction accuracy.
arXiv Detail & Related papers (2024-11-25T16:36:38Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records [2.608410928225647]
This study developed and validated the RTSurv framework to structure unstructured electronic health records alongside structured clinical data.<n>Using unstructured data from 34,276 patients and an external cohort of 852, the framework successfully transformed unstructured information into structured formats.
arXiv Detail & Related papers (2024-08-09T14:02:24Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data
for Interpretable In-Hospital Mortality Prediction [8.625186194860696]
We provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality.
To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes.
We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT.
arXiv Detail & Related papers (2022-08-09T03:49:52Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.