MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU
- URL: http://arxiv.org/abs/2510.24500v1
- Date: Tue, 28 Oct 2025 15:13:38 GMT
- Title: MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU
- Authors: Yong Huang, Zhongqi Yang, Amir Rahmani,
- Abstract summary: We introduce MIMIC-Sepsis, a curated cohort and benchmark framework derived from the MIMIC-IV database.<n>Our cohort includes 35,239 ICU patients with time-aligned clinical variables and standardized treatment data.<n>We describe a transparent preprocessing pipeline-based on Sepsis-3 criteria, structured imputation strategies, and treatment inclusion-and release it alongside benchmark tasks.
- Score: 1.3849459413926863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sepsis is a leading cause of mortality in intensive care units (ICUs), yet existing research often relies on outdated datasets, non-reproducible preprocessing pipelines, and limited coverage of clinical interventions. We introduce MIMIC-Sepsis, a curated cohort and benchmark framework derived from the MIMIC-IV database, designed to support reproducible modeling of sepsis trajectories. Our cohort includes 35,239 ICU patients with time-aligned clinical variables and standardized treatment data, including vasopressors, fluids, mechanical ventilation and antibiotics. We describe a transparent preprocessing pipeline-based on Sepsis-3 criteria, structured imputation strategies, and treatment inclusion-and release it alongside benchmark tasks focused on early mortality prediction, length-of-stay estimation, and shock onset classification. Empirical results demonstrate that incorporating treatment variables substantially improves model performance, particularly for Transformer-based architectures. MIMIC-Sepsis serves as a robust platform for evaluating predictive and sequential models in critical care research.
Related papers
- CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space [49.74032713886216]
CLARITY is a medical world model that forecasts disease evolution directly within a structured latent space.<n>It explicitly integrates time intervals (temporal context) and patient-specific data (clinical context) to model treatment-conditioned progression as a smooth, interpretable trajectory.
arXiv Detail & Related papers (2025-12-08T20:42:10Z) - Enhancing mortality prediction in cardiac arrest ICU patients through meta-modeling of structured clinical data from MIMIC-IV [0.0]
This study develops and evaluates machine learning models that integrate structured clinical data and unstructured information.<n>We used LASSO and XGBoost for feature selection, followed by a logistic regression trained on the top features identified by both models.<n>The final logistic regression model, which combined structured and textual input, achieved an AUC of 0.918, compared to 0.753 when using structured data alone, a relative improvement 22%.
arXiv Detail & Related papers (2025-10-20T20:56:45Z) - Deep State-Space Generative Model For Correlated Time-to-Event Predictions [54.3637600983898]
We propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events.
Our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.
arXiv Detail & Related papers (2024-07-28T02:42:36Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Simulation-based Inference for Cardiovascular Models [43.55219268578912]
We use simulation-based inference to solve the inverse problem of mapping waveforms back to plausible physiological parameters.<n>We perform an in-silico uncertainty analysis of five biomarkers of clinical interest.<n>We study the gap between in-vivo and in-silico with the MIMIC-III waveform database.
arXiv Detail & Related papers (2023-07-26T02:34:57Z) - Integrating Physiological Time Series and Clinical Notes with
Transformer for Early Prediction of Sepsis [10.791880225915255]
Sepsis is a leading cause of death in the Intensive Care Units (ICU)
We propose a multimodal Transformer model for early sepsis prediction.
We use the physiological time series data and clinical notes for each patient within $36$ hours of ICU admission.
arXiv Detail & Related papers (2022-03-28T03:19:03Z) - Early Prediction of Mortality in Critical Care Setting in Sepsis
Patients Using Structured Features and Unstructured Clinical Notes [4.387308555401595]
Using the MIMIC-III database, we integrated demographic data, physiological measurements and clinical notes.
We built and applied several machine learning models to predict the risk of hospital mortality and 30-day mortality in sepsis patients.
arXiv Detail & Related papers (2021-11-09T19:57:05Z) - Improving Early Sepsis Prediction with Multi Modal Learning [5.129463113166068]
Clinical text provides essential information to estimate the severity of sepsis.
We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text.
Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
arXiv Detail & Related papers (2021-07-23T09:25:31Z) - MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response [58.0291320452122]
This paper aims at a unified deep learning approach to predict patient prognosis and therapy response.
We formalize the prognosis modeling as a multi-modal asynchronous time series classification task.
Our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.
arXiv Detail & Related papers (2020-10-08T15:30:17Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Integrating Physiological Time Series and Clinical Notes with Deep
Learning for Improved ICU Mortality Prediction [21.919977518774015]
We study how physiological time series data and clinical notes can be integrated into a unified mortality prediction model.
Our results show that a late fusion approach can provide a statistically significant improvement in prediction mortality over using individual modalities in isolation.
arXiv Detail & Related papers (2020-03-24T18:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.