Related papers: CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks

CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks

URL: http://arxiv.org/abs/2111.08585v1
Date: Wed, 10 Nov 2021 16:53:32 GMT
Title: CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks
Authors: Chao Pang (1), Xinzhuo Jiang (1), Krishna S Kalluri (1), Matthew Spotnitz (1), RuiJun Chen (2), Adler Perotte (1), Karthik Natarajan (1) ((1) Columbia University Irving Medical Center, (2) Geisinger)
Abstract summary: We develop a new BERT adaptation, CEHR-BERT, to incorporate temporal information using a hybrid approach. CEHR-BERT was trained on a subset of Columbia University Irving Medical Center-York Presbyterian Hospital's clinical data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Embedding algorithms are increasingly used to represent clinical concepts in healthcare for improving machine learning tasks such as clinical phenotyping and disease prediction. Recent studies have adapted state-of-the-art bidirectional encoder representations from transformers (BERT) architecture to structured electronic health records (EHR) data for the generation of contextualized concept embeddings, yet do not fully incorporate temporal data across multiple clinical domains. Therefore we developed a new BERT adaptation, CEHR-BERT, to incorporate temporal information using a hybrid approach by augmenting the input to BERT using artificial time tokens, incorporating time, age, and concept embeddings, and introducing a new second learning objective for visit type. CEHR-BERT was trained on a subset of Columbia University Irving Medical Center-York Presbyterian Hospital's clinical data, which includes 2.4M patients, spanning over three decades, and tested using 4-fold cross-validation on the following prediction tasks: hospitalization, death, new heart failure (HF) diagnosis, and HF readmission. Our experiments show that CEHR-BERT outperformed existing state-of-the-art clinical BERT adaptations and baseline models across all 4 prediction tasks in both ROC-AUC and PR-AUC. CEHR-BERT also demonstrated strong transfer learning capability, as our model trained on only 5% of data outperformed comparison models trained on the entire data set. Ablation studies to better understand the contribution of each time component showed incremental gains with every element, suggesting that CEHR-BERT's incorporation of artificial time tokens, time and age embeddings with concept embeddings, and the addition of the second learning objective represents a promising approach for future BERT-based clinical embeddings.

Related papers

Electrocardiogram (ECG) Based Cardiac Arrhythmia Detection and Classification using Machine Learning Algorithms [0.0]
Machine Learning (ML) and Deep Learning (DL) have opened new prospects in medical sciences for improved diagnosis, prognosis, and treatment of severe health conditions. This paper focuses on the development of an ML model with high predictive accuracy to classify arrhythmic electrocardiogram (ECG) signals.
arXiv Detail & Related papers (2024-12-07T08:29:44Z)
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials. We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z)
LATTE: Label-efficient Incident Phenotyping from Longitudinal Electronic Health Records [11.408950540503112]
We propose a LAbel-efficienT incidenT phEnotyping algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. LATTE is evaluated on three analyses: the onset of type-2 diabetes, heart failure, and the onset and relapses of multiple sclerosis.
arXiv Detail & Related papers (2023-05-19T03:28:51Z)
sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets. We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z)
Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records. We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data. We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z)
MIMO: Mutual Integration of Patient Journey and Medical Ontology for Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z)
Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure. We incorporate these pre-trained representations into a graph neural network to detect disease complications. We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Bidirectional Representation Learning from Transformers using Multimodal Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression. The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z)
BiteNet: Bidirectional Temporal Encoder Network to Predict Medical Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey. An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys. We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction [12.669003066030697]
We propose Med-BERT, which adapts the BERT framework for pre-training contextualized embedding models on structured diagnosis data from 28,490,650 patients EHR dataset. Med-BERT substantially improves prediction accuracy, boosting the area under receiver operating characteristics curve (AUC) by 2.02-7.12%. In particular, pre-trained Med-BERT substantially improves the performance of tasks with very small fine-tuning training sets (300-500 samples) boosting the AUC by more than 20% or equivalent to the AUC of 10 times larger training set.
arXiv Detail & Related papers (2020-05-22T05:07:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.