Multimodal Pretraining of Medical Time Series and Notes
- URL: http://arxiv.org/abs/2312.06855v1
- Date: Mon, 11 Dec 2023 21:53:40 GMT
- Title: Multimodal Pretraining of Medical Time Series and Notes
- Authors: Ryan King, Tianbao Yang, Bobak Mortazavi
- Abstract summary: Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data.
We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes.
In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
- Score: 45.89025874396911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Within the intensive care unit (ICU), a wealth of patient data, including
clinical measurements and clinical notes, is readily available. This data is a
valuable resource for comprehending patient health and informing medical
decisions, but it also contains many challenges in analysis. Deep learning
models show promise in extracting meaningful patterns, but they require
extensive labeled data, a challenge in critical care. To address this, we
propose a novel approach employing self-supervised pretraining, focusing on the
alignment of clinical measurements and notes. Our approach combines contrastive
and masked token prediction tasks during pretraining. Semi-supervised
experiments on the MIMIC-III dataset demonstrate the effectiveness of our
self-supervised pretraining. In downstream tasks, including in-hospital
mortality prediction and phenotyping, our pretrained model outperforms
baselines in settings where only a fraction of the data is labeled, emphasizing
its ability to enhance ICU data analysis. Notably, our method excels in
situations where very few labels are available, as evidenced by an increase in
the AUC-ROC for in-hospital mortality by 0.17 and in AUC-PR for phenotyping by
0.1 when only 1% of labels are accessible. This work advances self-supervised
learning in the healthcare domain, optimizing clinical insights from abundant
yet challenging ICU data.
Related papers
- An Efficient Contrastive Unimodal Pretraining Method for EHR Time Series Data [35.943089444017666]
We propose an efficient method of contrastive pretraining tailored for long clinical timeseries data.
Our model demonstrates the ability to impute missing measurements, providing clinicians with deeper insights into patient conditions.
arXiv Detail & Related papers (2024-10-11T19:05:25Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - FineEHR: Refine Clinical Note Representations to Improve Mortality
Prediction [3.9026461169566673]
Large-scale electronic health records provide machine learning models with an abundance of clinical text and vital sign data.
Despite the emergence of advanced Natural Language Processing (NLP) algorithms for clinical note analysis, the complex textual structure and noise present in raw clinical data have posed significant challenges.
We propose FINEEHR, a system that utilizes two representation learning techniques, namely metric learning and fine-tuning, to refine clinical note embeddings.
arXiv Detail & Related papers (2023-04-24T02:42:52Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z) - Improving Early Sepsis Prediction with Multi Modal Learning [5.129463113166068]
Clinical text provides essential information to estimate the severity of sepsis.
We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text.
Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
arXiv Detail & Related papers (2021-07-23T09:25:31Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z) - Integrating Physiological Time Series and Clinical Notes with Deep
Learning for Improved ICU Mortality Prediction [21.919977518774015]
We study how physiological time series data and clinical notes can be integrated into a unified mortality prediction model.
Our results show that a late fusion approach can provide a statistically significant improvement in prediction mortality over using individual modalities in isolation.
arXiv Detail & Related papers (2020-03-24T18:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.