Related papers: Multimodal Pretraining of Medical Time Series and Notes

Multimodal Pretraining of Medical Time Series and Notes

URL: http://arxiv.org/abs/2312.06855v1
Date: Mon, 11 Dec 2023 21:53:40 GMT
Title: Multimodal Pretraining of Medical Time Series and Notes
Authors: Ryan King, Tianbao Yang, Bobak Mortazavi
Abstract summary: Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data. We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes. In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
Score: 45.89025874396911
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Within the intensive care unit (ICU), a wealth of patient data, including clinical measurements and clinical notes, is readily available. This data is a valuable resource for comprehending patient health and informing medical decisions, but it also contains many challenges in analysis. Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data, a challenge in critical care. To address this, we propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes. Our approach combines contrastive and masked token prediction tasks during pretraining. Semi-supervised experiments on the MIMIC-III dataset demonstrate the effectiveness of our self-supervised pretraining. In downstream tasks, including in-hospital mortality prediction and phenotyping, our pretrained model outperforms baselines in settings where only a fraction of the data is labeled, emphasizing its ability to enhance ICU data analysis. Notably, our method excels in situations where very few labels are available, as evidenced by an increase in the AUC-ROC for in-hospital mortality by 0.17 and in AUC-PR for phenotyping by 0.1 when only 1% of labels are accessible. This work advances self-supervised learning in the healthcare domain, optimizing clinical insights from abundant yet challenging ICU data.

Related papers

Paging Dr. GPT: Extracting Information from Clinical Notes to Enhance Patient Predictions [0.25165775267615204]
We investigate how answers generated by GPT-4o-mini (ChatGPT) to simple clinical questions about patients can support patient-level mortality prediction. Using data from 14,011 first-time admissions to the Coronary Care or Cardiovascular Intensive Care Units in the MIMIC-IV Note dataset, we implement a transparent framework that uses GPT responses as input features in logistic regression models.
arXiv Detail & Related papers (2025-04-14T17:41:45Z)
An Efficient Contrastive Unimodal Pretraining Method for EHR Time Series Data [35.943089444017666]
We propose an efficient method of contrastive pretraining tailored for long clinical timeseries data. Our model demonstrates the ability to impute missing measurements, providing clinicians with deeper insights into patient conditions.
arXiv Detail & Related papers (2024-10-11T19:05:25Z)
Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z)
FineEHR: Refine Clinical Note Representations to Improve Mortality Prediction [3.9026461169566673]
Large-scale electronic health records provide machine learning models with an abundance of clinical text and vital sign data. Despite the emergence of advanced Natural Language Processing (NLP) algorithms for clinical note analysis, the complex textual structure and noise present in raw clinical data have posed significant challenges. We propose FINEEHR, a system that utilizes two representation learning techniques, namely metric learning and fine-tuning, to refine clinical note embeddings.
arXiv Detail & Related papers (2023-04-24T02:42:52Z)
Unsupervised pre-training of graph transformers on patient population graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data. We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z)
Unsupervised Pre-Training on Patient Population Graphs for Patient-Level Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging. In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction. We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z)
Improving Early Sepsis Prediction with Multi Modal Learning [5.129463113166068]
Clinical text provides essential information to estimate the severity of sepsis. We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text. Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
arXiv Detail & Related papers (2021-07-23T09:25:31Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
Self-Training with Improved Regularization for Sample-Efficient Chest X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios. Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
Integrating Physiological Time Series and Clinical Notes with Deep Learning for Improved ICU Mortality Prediction [21.919977518774015]
We study how physiological time series data and clinical notes can be integrated into a unified mortality prediction model. Our results show that a late fusion approach can provide a statistically significant improvement in prediction mortality over using individual modalities in isolation.
arXiv Detail & Related papers (2020-03-24T18:25:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.