Extracting Lifestyle Factors for Alzheimer's Disease from Clinical Notes
Using Deep Learning with Weak Supervision
- URL: http://arxiv.org/abs/2101.09244v2
- Date: Mon, 25 Jan 2021 03:42:00 GMT
- Title: Extracting Lifestyle Factors for Alzheimer's Disease from Clinical Notes
Using Deep Learning with Weak Supervision
- Authors: Zitao Shen, Yoonkwon Yi, Anusha Bompelli, Fang Yu, Yanshan Wang, Rui
Zhang
- Abstract summary: The objective of the study was to demonstrate the feasibility of natural language processing (NLP) models to classify lifestyle factors.
We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models.
The proposed approach leveraging weak supervision could significantly increase the sample size.
- Score: 9.53786612243512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since no effective therapies exist for Alzheimer's disease (AD), prevention
has become more critical through lifestyle factor changes and interventions.
Analyzing electronic health records (EHR) of patients with AD can help us
better understand lifestyle's effect on AD. However, lifestyle information is
typically stored in clinical narratives. Thus, the objective of the study was
to demonstrate the feasibility of natural language processing (NLP) models to
classify lifestyle factors (e.g., physical activity and excessive diet) from
clinical texts. We automatically generated labels for the training data by
using a rule-based NLP algorithm. We conducted weak supervision for pre-trained
Bidirectional Encoder Representations from Transformers (BERT) models on the
weakly labeled training corpus. These models include the BERT base model,
PubMedBERT(abstracts + full text), PubMedBERT(only abstracts), Unified Medical
Language System (UMLS) BERT, Bio BERT, and Bio-clinical BERT. We performed two
case studies: physical activity and excessive diet, in order to validate the
effectiveness of BERT models in classifying lifestyle factors for AD. These
models were compared on the developed Gold Standard Corpus (GSC) on the two
case studies. The PubmedBERT(Abs) model achieved the best performance for
physical activity, with its precision, recall, and F-1 scores of 0.96, 0.96,
and 0.96, respectively. Regarding classifying excessive diet, the Bio BERT
model showed the highest performance with perfect precision, recall, and F-1
scores. The proposed approach leveraging weak supervision could significantly
increase the sample size, which is required for training the deep learning
models. The study also demonstrates the effectiveness of BERT models for
extracting lifestyle factors for Alzheimer's disease from clinical notes.
Related papers
- From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [50.80532910808962]
We present GluFormer, a generative foundation model on biomedical temporal data based on a transformer architecture.
GluFormer generalizes to 15 different external datasets, including 4936 individuals across 5 different geographical regions.
It can also predict onset of future health outcomes even 4 years in advance.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Development and Validation of a Deep-Learning Model for Differential Treatment Benefit Prediction for Adults with Major Depressive Disorder Deployed in the Artificial Intelligence in Depression Medication Enhancement (AIDME) Study [0.622895724042048]
The pharmacological treatment of Major Depressive Disorder (MDD) relies on a trial-and-error approach.
We introduce an artificial intelligence (AI) model aiming to personalize treatment outcomes.
arXiv Detail & Related papers (2024-06-07T15:04:59Z) - Nurse-in-the-Loop Artificial Intelligence for Precision Management of
Type 2 Diabetes in a Clinical Trial Utilizing Transfer-Learned Predictive
Digital Twin [5.521385406191426]
The study developed an online nurse-in-the-loop predictive control (ONLC) model that utilizes a predictive digital twin (PDT)
The PDT was trained on participants self-monitoring data (weight, food logs, physical activity, glucose) from the first three months.
The ONLC provided the intervention group with individualized feedback and recommendations via text messages.
arXiv Detail & Related papers (2024-01-05T06:38:50Z) - Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for
Predicting Drug-Review Satisfaction [0.0]
We implement and evaluate several classification models, including a BERT base model, Bio+Clinical BERT, and a simpler CNN.
Results indicate that the medical domain-specific Bio+Clinical BERT model significantly outperformed the general domain base BERT model.
Future research could explore how to capitalize on the specific strengths of each model.
arXiv Detail & Related papers (2023-08-02T20:01:38Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - A Study of Social and Behavioral Determinants of Health in Lung Cancer
Patients Using Transformers-based Natural Language Processing Models [23.68697811086486]
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health.
There are limited studies to examine SBDoH factors in clinical outcomes due to the lack of structured SBDoH information in current electronic health record systems.
Natural language processing (NLP) is thus the key technology to extract such information from unstructured clinical text.
arXiv Detail & Related papers (2021-08-10T22:11:31Z) - Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? [70.3631443249802]
We design a battery of approaches intended to recover Personal Health Information from a trained BERT.
Specifically, we attempt to recover patient names and conditions with which they are associated.
We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR.
arXiv Detail & Related papers (2021-04-15T20:40:05Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.