Rare Event Early Detection: A Dataset of Sepsis Onset for Critically Ill Trauma Patients
- URL: http://arxiv.org/abs/2602.02930v1
- Date: Tue, 03 Feb 2026 00:04:25 GMT
- Title: Rare Event Early Detection: A Dataset of Sepsis Onset for Critically Ill Trauma Patients
- Authors: Yin Jin, Tucker R. Stewart, Deyi Zhou, Chhavi Gupta, Arjita Nema, Scott C. Brakenridge, Grant E. O'Keefe, Juhua Hu,
- Abstract summary: We introduce a publicly available standardized post-trauma sepsis onset dataset extracted, relabeled using standardized post-trauma clinical facts, and validated from MIMIC-III.<n>We frame early detection of post-trauma sepsis onset according to clinical workflow in ICUs in a daily basis resulting in a new rare event detection problem.
- Score: 7.181818511491392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sepsis is a major public health concern due to its high morbidity, mortality, and cost. Its clinical outcome can be substantially improved through early detection and timely intervention. By leveraging publicly available datasets, machine learning (ML) has driven advances in both research and clinical practice. However, existing public datasets consider ICU patients (Intensive Care Unit) as a uniform group and neglect the potential challenges presented by critically ill trauma patients in whom injury-related inflammation and organ dysfunction can overlap with the clinical features of sepsis. We propose that a targeted identification of post-traumatic sepsis is necessary in order to develop methods for early detection. Therefore, we introduce a publicly available standardized post-trauma sepsis onset dataset extracted, relabeled using standardized post-trauma clinical facts, and validated from MIMIC-III. Furthermore, we frame early detection of post-trauma sepsis onset according to clinical workflow in ICUs in a daily basis resulting in a new rare event detection problem. We then establish a general benchmark through comprehensive experiments, which shows the necessity of further advancements using this new dataset. The data code is available at https://github.com/ML4UWHealth/SepsisOnset_TraumaCohort.git.
Related papers
- FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data [52.55123685248105]
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment.
Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality.
This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD.
arXiv Detail & Related papers (2024-10-28T02:24:01Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [54.98321887435557]
This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - NPRL: Nightly Profile Representation Learning for Early Sepsis Onset
Prediction in ICU Trauma Patients [5.476582906474746]
Sepsis is a syndrome that develops in the body in response to the presence of an infection.
Current machine learning algorithms have demonstrated poor performance and are insufficient for anticipating sepsis onset early.
We propose a novel but realistic prediction framework that predicts sepsis onset each morning using the most recent data collected the previous night.
arXiv Detail & Related papers (2023-04-25T11:27:27Z) - ALRt: An Active Learning Framework for Irregularly Sampled Temporal Data [1.370633147306388]
Sepsis is a deadly condition affecting many patients in the hospital.
We propose the use of Active Learning Recurrent Neural Networks (ALRts) for short temporal horizons to improve the prediction of irregularly sampled temporal events such as sepsis.
We show that an active learning RNN model trained on limited data can form robust sepsis predictions comparable to models using the entire training dataset.
arXiv Detail & Related papers (2022-12-13T04:31:49Z) - Improving Early Sepsis Prediction with Multi Modal Learning [5.129463113166068]
Clinical text provides essential information to estimate the severity of sepsis.
We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text.
Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
arXiv Detail & Related papers (2021-07-23T09:25:31Z) - Machine learning-based analysis of hyperspectral images for automated
sepsis diagnosis [28.77667667876798]
Automated machine learning-based diagnosis of sepsis based on hyperspectral imaging data has not been explored to date.
While we were able to classify sepsis with an accuracy of over $98,%$ using the existing data, our research also revealed several subject-, therapy- and imaging-related confounders.
arXiv Detail & Related papers (2021-06-15T21:33:59Z) - MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response [58.0291320452122]
This paper aims at a unified deep learning approach to predict patient prognosis and therapy response.
We formalize the prognosis modeling as a multi-modal asynchronous time series classification task.
Our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.
arXiv Detail & Related papers (2020-10-08T15:30:17Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.