Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients
- URL: http://arxiv.org/abs/2308.10781v1
- Date: Mon, 21 Aug 2023 15:14:49 GMT
- Title: Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients
- Authors: Mehak Arora, Hassan Mortagy, Nathan Dwarshius, Swati Gupta, Andre L.
Holder, Rishikesan Kamaleswaran
- Abstract summary: We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints.
We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores"
We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
- Score: 7.639610349097473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) models are increasingly pivotal in automating clinical
decisions. Yet, a glaring oversight in prior research has been the lack of
proper processing of Electronic Medical Record (EMR) data in the clinical
context for errors and outliers. Addressing this oversight, we introduce an
innovative projections-based method that seamlessly integrates clinical
expertise as domain constraints, generating important meta-data that can be
used in ML workflows. In particular, by using high-dimensional mixed-integer
programs that capture physiological and biological constraints on patient
vitals and lab values, we can harness the power of mathematical "projections"
for the EMR data to correct patient data. Consequently, we measure the distance
of corrected data from the constraints defining a healthy range of patient
data, resulting in a unique predictive metric we term as "trust-scores". These
scores provide insight into the patient's health status and significantly boost
the performance of ML classifiers in real-life clinical settings. We validate
the impact of our framework in the context of early detection of sepsis using
ML. We show an AUROC of 0.865 and a precision of 0.922, that surpasses
conventional ML models without such projections.
Related papers
- Prediction of Lung Metastasis from Hepatocellular Carcinoma using the SEER Database [0.9055332067000195]
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality.
predictive models for lung metastasis inHCC remain limited in scope and clinical applicability.
We develop and validate an end-to-end machine learning pipeline using data from the Surveillance, Epidemiology, and End Results (SEER) database.
arXiv Detail & Related papers (2025-01-20T20:06:31Z) - Representation Learning of Lab Values via Masked AutoEncoder [2.785172582119726]
We propose Lab-MAE, a transformer-based masked autoencoder framework for imputation of sequential lab values.
Empirical evaluation on the MIMIC-IV dataset demonstrates that Lab-MAE significantly outperforms the state-of-the-art baselines.
Lab-MAE achieves equitable performance across demographic groups of patients, advancing fairness in clinical predictions.
arXiv Detail & Related papers (2025-01-05T20:26:49Z) - Machine Learning-Based Prediction of ICU Readmissions in Intracerebral Hemorrhage Patients: Insights from the MIMIC Databases [0.0]
Intracerebral hemorrhage (ICH) is a life-risking condition characterized by bleeding within the brain parenchyma.
This study utilized the Medical Information Mart for Intensive Care (MIMIC-III and MIMIC-IV) databases to develop predictive models for ICU readmission risk.
arXiv Detail & Related papers (2025-01-02T10:19:27Z) - Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality [0.0]
Sepsis is a severe condition responsible for many deaths in the United States and worldwide.
Previous studies employing machine learning faced limitations in feature selection and model interpretability.
This research aimed to develop an interpretable and accurate machine learning model to predict in-hospital sepsis mortality.
arXiv Detail & Related papers (2024-08-03T00:28:25Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.