Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients
- URL: http://arxiv.org/abs/2308.10781v1
- Date: Mon, 21 Aug 2023 15:14:49 GMT
- Title: Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients
- Authors: Mehak Arora, Hassan Mortagy, Nathan Dwarshius, Swati Gupta, Andre L.
Holder, Rishikesan Kamaleswaran
- Abstract summary: We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints.
We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores"
We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
- Score: 7.639610349097473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) models are increasingly pivotal in automating clinical
decisions. Yet, a glaring oversight in prior research has been the lack of
proper processing of Electronic Medical Record (EMR) data in the clinical
context for errors and outliers. Addressing this oversight, we introduce an
innovative projections-based method that seamlessly integrates clinical
expertise as domain constraints, generating important meta-data that can be
used in ML workflows. In particular, by using high-dimensional mixed-integer
programs that capture physiological and biological constraints on patient
vitals and lab values, we can harness the power of mathematical "projections"
for the EMR data to correct patient data. Consequently, we measure the distance
of corrected data from the constraints defining a healthy range of patient
data, resulting in a unique predictive metric we term as "trust-scores". These
scores provide insight into the patient's health status and significantly boost
the performance of ML classifiers in real-life clinical settings. We validate
the impact of our framework in the context of early detection of sepsis using
ML. We show an AUROC of 0.865 and a precision of 0.922, that surpasses
conventional ML models without such projections.
Related papers
- Evaluating Machine Learning Models against Clinical Protocols for Enhanced Interpretability and Continuity of Care [39.58317527488534]
In clinical practice, decision-making relies heavily on established protocols, often formalised as rules.
Despite the growing number of Machine Learning applications, their adoption into clinical practice remains limited.
We propose metrics to assess the accuracy of ML models with respect to the established protocol.
arXiv Detail & Related papers (2024-11-05T13:50:09Z) - When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? [8.89829757177796]
We examine the effectiveness of vector representations from last hidden states of Large Language Models for medical diagnostics and prognostics.
We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluate their utilities as feature extractors.
Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results.
arXiv Detail & Related papers (2024-08-15T03:56:40Z) - Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality [0.0]
This research aims to develop an interpretable and accurate ML model to help clinical professionals predict in-hospital mortality.
We analyzed ICU patient records from the MIMIC-III database based on specific criteria and extracted relevant data.
The Random Forest model was the most effective in predicting sepsis-related in-hospital mortality.
arXiv Detail & Related papers (2024-08-03T00:28:25Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Modeling Disagreement in Automatic Data Labelling for Semi-Supervised
Learning in Clinical Natural Language Processing [2.016042047576802]
We investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports.
arXiv Detail & Related papers (2022-05-29T20:20:49Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.