Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions
- URL: http://arxiv.org/abs/2203.12616v1
- Date: Wed, 23 Mar 2022 17:59:45 GMT
- Title: Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions
- Authors: Chantal Pellegrini, Anees Kazi, Nassir Navab
- Abstract summary: Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
- Score: 48.02011627390706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training has shown success in different areas of machine learning, such
as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
However, it has not been fully explored for clinical data analysis. Even though
an immense amount of Electronic Health Record (EHR) data is recorded, data and
labels can be scarce if the data is collected in small hospitals or deals with
rare diseases. In such scenarios, pre-training on a larger set of EHR data
could improve the model performance. In this paper, we apply unsupervised
pre-training to heterogeneous, multi-modal EHR data for patient outcome
prediction. To model this data, we leverage graph deep learning over population
graphs. We first design a network architecture based on graph transformer
designed to handle various input feature types occurring in EHR data, like
continuous, discrete, and time-series features, allowing better multi-modal
data fusion. Further, we design pre-training methods based on masked imputation
to pre-train our network before fine-tuning on different end tasks.
Pre-training is done in a fully unsupervised fashion, which lays the groundwork
for pre-training on large public datasets with different tasks and similar
modalities in the future. We test our method on two medical datasets of patient
records, TADPOLE and MIMIC-III, including imaging and non-imaging features and
different prediction tasks. We find that our proposed graph based pre-training
method helps in modeling the data at a population level and further improves
performance on the fine tuning tasks in terms of AUC on average by 4.15% for
MIMIC and 7.64% for TADPOLE.
Related papers
- MPLite: Multi-Aspect Pretraining for Mining Clinical Health Records [13.4100093553808]
We present a novel framework MPLite that utilizes Multi-aspect Pretraining with Lab results through a light-weight neural network to enhance medical concept representation.
We design a pretraining module that predicts medical codes based on lab results, ensuring robust prediction by fusing multiple aspects of features.
arXiv Detail & Related papers (2024-11-17T19:43:10Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Federated Learning of Medical Concepts Embedding using BEHRT [0.0]
We propose a federated learning approach for learning medical concepts embedding.
Our approach is based on embedding model like BEHRT, a deep neural sequence model for EHR.
We compare the performance of a model trained with FL against a model trained on centralized data.
arXiv Detail & Related papers (2023-05-22T14:05:39Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Pre-training transformer-based framework on large-scale pediatric claims
data for downstream population-specific tasks [3.1580072841682734]
This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset.
The effective knowledge transfer is completed through the task-aware fine-tuning stage.
We conducted experiments on a real-world claims dataset with more than one million patient records.
arXiv Detail & Related papers (2021-06-24T15:25:41Z) - Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal
Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure.
We incorporate these pre-trained representations into a graph neural network to detect disease complications.
We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.