Contrastive Learning Improves Critical Event Prediction in COVID-19
Patients
- URL: http://arxiv.org/abs/2101.04013v1
- Date: Mon, 11 Jan 2021 16:41:13 GMT
- Title: Contrastive Learning Improves Critical Event Prediction in COVID-19
Patients
- Authors: Tingyi Wanyan, Hossein Honarvar, Suraj K. Jaladanki, Chengxi Zang,
Nidhi Naik, Sulaiman Somani, Jessica K. De Freitas, Ishan Paranjpe, Akhil
Vaid, Riccardo Miotto, Girish N. Nadkarni, Marinka Zitnik, ArifulAzad, Fei
Wang, Ying Ding, Benjamin S. Glicksberg
- Abstract summary: We show that contrastive loss (CL) improves the performance of cross-entropy loss (CEL) for imbalanced EHR data.
This study has been approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai.
- Score: 19.419685256069666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) models typically require large-scale, balanced training
data to be robust, generalizable, and effective in the context of healthcare.
This has been a major issue for developing ML models for the
coronavirus-disease 2019 (COVID-19) pandemic where data is highly imbalanced,
particularly within electronic health records (EHR) research. Conventional
approaches in ML use cross-entropy loss (CEL) that often suffers from poor
margin classification. For the first time, we show that contrastive loss (CL)
improves the performance of CEL especially for imbalanced EHR data and the
related COVID-19 analyses. This study has been approved by the Institutional
Review Board at the Icahn School of Medicine at Mount Sinai. We use EHR data
from five hospitals within the Mount Sinai Health System (MSHS) to predict
mortality, intubation, and intensive care unit (ICU) transfer in hospitalized
COVID-19 patients over 24 and 48 hour time windows. We train two sequential
architectures (RNN and RETAIN) using two loss functions (CEL and CL). Models
are tested on full sample data set which contain all available data and
restricted data set to emulate higher class imbalance.CL models consistently
outperform CEL models with the restricted data set on these tasks with
differences ranging from 0.04 to 0.15 for AUPRC and 0.05 to 0.1 for AUROC. For
the restricted sample, only the CL model maintains proper clustering and is
able to identify important features, such as pulse oximetry. CL outperforms CEL
in instances of severe class imbalance, on three EHR outcomes with respect to
three performance metrics: predictive power, clustering, and feature
importance. We believe that the developed CL framework can be expanded and used
for EHR ML work in general.
Related papers
- Enhancing Glucose Level Prediction of ICU Patients through Irregular Time-Series Analysis and Integrated Representation [4.101915841246237]
We develop a novel learning-based model to forecast the next level, classifying it into hypoglycemia, hyperglycemia, or euglycemia.
This study focuses on predicting blood glucose levels in ICU patients, but MITST can easily be extended to other critical event prediction tasks.
arXiv Detail & Related papers (2024-11-03T03:03:11Z) - FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data [52.55123685248105]
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment.
Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality.
This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD.
arXiv Detail & Related papers (2024-10-28T02:24:01Z) - CEL: A Continual Learning Model for Disease Outbreak Prediction by
Leveraging Domain Adaptation via Elastic Weight Consolidation [4.693707128262634]
This study introduces a novel CEL model for continual learning by leveraging domain adaptation via Elastic Weight Consolidation (EWC)
CEL's robustness and reliability are underscored by its minimal 65% forgetting rate and 18% higher memory stability compared to existing benchmark studies.
arXiv Detail & Related papers (2024-01-17T03:26:04Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - Mixed-Integer Projections for Automated Data Correction of EMRs Improve
Predictions of Sepsis among Hospitalized Patients [7.639610349097473]
We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints.
We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores"
We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
arXiv Detail & Related papers (2023-08-21T15:14:49Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Density-Aware Personalized Training for Risk Prediction in Imbalanced
Medical Data [89.79617468457393]
Training models with imbalance rate (class density discrepancy) may lead to suboptimal prediction.
We propose a framework for training models for this imbalance issue.
We demonstrate our model's improved performance in real-world medical datasets.
arXiv Detail & Related papers (2022-07-23T00:39:53Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Predicting Hyperkalemia in the ICU and Evaluation of Generalizability
and Interpretability [5.9854349801427285]
Hyperkalemia is a potentially life-threatening condition that can lead to fatal arrhythmias.
We developed predictive models to identify intensive care unit (ICU) patients at risk of developing hyperkalemia.
Our models were able to predict hyperkalemia with an AUC of (i) 0.79, 0.81, 0.81 and (ii) 0.81, 0.85, 0.85 for LR, RF, and XGBoost respectively.
arXiv Detail & Related papers (2021-01-16T12:35:27Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.