CEHR-BERT: Incorporating temporal information from structured EHR data
to improve prediction tasks
- URL: http://arxiv.org/abs/2111.08585v1
- Date: Wed, 10 Nov 2021 16:53:32 GMT
- Title: CEHR-BERT: Incorporating temporal information from structured EHR data
to improve prediction tasks
- Authors: Chao Pang (1), Xinzhuo Jiang (1), Krishna S Kalluri (1), Matthew
Spotnitz (1), RuiJun Chen (2), Adler Perotte (1), Karthik Natarajan (1) ((1)
Columbia University Irving Medical Center, (2) Geisinger)
- Abstract summary: We develop a new BERT adaptation, CEHR-BERT, to incorporate temporal information using a hybrid approach.
CEHR-BERT was trained on a subset of Columbia University Irving Medical Center-York Presbyterian Hospital's clinical data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Embedding algorithms are increasingly used to represent clinical concepts in
healthcare for improving machine learning tasks such as clinical phenotyping
and disease prediction. Recent studies have adapted state-of-the-art
bidirectional encoder representations from transformers (BERT) architecture to
structured electronic health records (EHR) data for the generation of
contextualized concept embeddings, yet do not fully incorporate temporal data
across multiple clinical domains. Therefore we developed a new BERT adaptation,
CEHR-BERT, to incorporate temporal information using a hybrid approach by
augmenting the input to BERT using artificial time tokens, incorporating time,
age, and concept embeddings, and introducing a new second learning objective
for visit type. CEHR-BERT was trained on a subset of Columbia University Irving
Medical Center-York Presbyterian Hospital's clinical data, which includes 2.4M
patients, spanning over three decades, and tested using 4-fold cross-validation
on the following prediction tasks: hospitalization, death, new heart failure
(HF) diagnosis, and HF readmission. Our experiments show that CEHR-BERT
outperformed existing state-of-the-art clinical BERT adaptations and baseline
models across all 4 prediction tasks in both ROC-AUC and PR-AUC. CEHR-BERT also
demonstrated strong transfer learning capability, as our model trained on only
5% of data outperformed comparison models trained on the entire data set.
Ablation studies to better understand the contribution of each time component
showed incremental gains with every element, suggesting that CEHR-BERT's
incorporation of artificial time tokens, time and age embeddings with concept
embeddings, and the addition of the second learning objective represents a
promising approach for future BERT-based clinical embeddings.
Related papers
- TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - LATTE: Label-efficient Incident Phenotyping from Longitudinal Electronic
Health Records [11.408950540503112]
We propose a LAbel-efficienT incidenT phEnotyping algorithm to accurately annotate the timing of clinical events from longitudinal EHR data.
LATTE is evaluated on three analyses: the onset of type-2 diabetes, heart failure, and the onset and relapses of multiple sclerosis.
arXiv Detail & Related papers (2023-05-19T03:28:51Z) - sEHR-CE: Language modelling of structured EHR data for efficient and
generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets.
We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal
Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure.
We incorporate these pre-trained representations into a graph neural network to detect disease complications.
We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Med-BERT: pre-trained contextualized embeddings on large-scale
structured electronic health records for disease prediction [12.669003066030697]
We propose Med-BERT, which adapts the BERT framework for pre-training contextualized embedding models on structured diagnosis data from 28,490,650 patients EHR dataset.
Med-BERT substantially improves prediction accuracy, boosting the area under receiver operating characteristics curve (AUC) by 2.02-7.12%.
In particular, pre-trained Med-BERT substantially improves the performance of tasks with very small fine-tuning training sets (300-500 samples) boosting the AUC by more than 20% or equivalent to the AUC of 10 times larger training set.
arXiv Detail & Related papers (2020-05-22T05:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.