A Study of Social and Behavioral Determinants of Health in Lung Cancer
Patients Using Transformers-based Natural Language Processing Models
- URL: http://arxiv.org/abs/2108.04949v1
- Date: Tue, 10 Aug 2021 22:11:31 GMT
- Title: A Study of Social and Behavioral Determinants of Health in Lung Cancer
Patients Using Transformers-based Natural Language Processing Models
- Authors: Zehao Yu, Xi Yang, Chong Dang, Songzi Wu, Prakash Adekkanattu,
Jyotishman Pathak, Thomas J. George, William R. Hogan, Yi Guo, Jiang Bian,
Yonghui Wu
- Abstract summary: Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health.
There are limited studies to examine SBDoH factors in clinical outcomes due to the lack of structured SBDoH information in current electronic health record systems.
Natural language processing (NLP) is thus the key technology to extract such information from unstructured clinical text.
- Score: 23.68697811086486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social and behavioral determinants of health (SBDoH) have important roles in
shaping people's health. In clinical research studies, especially comparative
effectiveness studies, failure to adjust for SBDoH factors will potentially
cause confounding issues and misclassification errors in either statistical
analyses and machine learning-based models. However, there are limited studies
to examine SBDoH factors in clinical outcomes due to the lack of structured
SBDoH information in current electronic health record (EHR) systems, while much
of the SBDoH information is documented in clinical narratives. Natural language
processing (NLP) is thus the key technology to extract such information from
unstructured clinical text. However, there is not a mature clinical NLP system
focusing on SBDoH. In this study, we examined two state-of-the-art
transformer-based NLP models, including BERT and RoBERTa, to extract SBDoH
concepts from clinical narratives, applied the best performing model to extract
SBDoH concepts on a lung cancer screening patient cohort, and examined the
difference of SBDoH information between NLP extracted results and structured
EHRs (SBDoH information captured in standard vocabularies such as the
International Classification of Diseases codes). The experimental results show
that the BERT-based NLP model achieved the best strict/lenient F1-score of
0.8791 and 0.8999, respectively. The comparison between NLP extracted SBDoH
information and structured EHRs in the lung cancer patient cohort of 864
patients with 161,933 various types of clinical notes showed that much more
detailed information about smoking, education, and employment were only
captured in clinical narratives and that it is necessary to use both clinical
narratives and structured EHRs to construct a more complete picture of
patients' SBDoH factors.
Related papers
- Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study [2.0884301753594334]
This study performs a comparative analysis of various natural language models for medical text classification.
BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
arXiv Detail & Related papers (2024-08-30T10:28:49Z) - Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model [0.7373617024876725]
Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants.
The complex nature of unstructured medical texts presents challenges in efficiently identifying participants.
In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task.
arXiv Detail & Related papers (2024-04-24T20:42:28Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data
for Interpretable In-Hospital Mortality Prediction [8.625186194860696]
We provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality.
To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes.
We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT.
arXiv Detail & Related papers (2022-08-09T03:49:52Z) - Intelligent Sight and Sound: A Chronic Cancer Pain Dataset [74.77784420691937]
This paper introduces the first chronic cancer pain dataset, collected as part of the Intelligent Sight and Sound (ISS) clinical trial.
The data collected to date consists of 29 patients, 509 smartphone videos, 189,999 frames, and self-reported affective and activity pain scores.
Using static images and multi-modal data to predict self-reported pain levels, early models show significant gaps between current methods available to predict pain.
arXiv Detail & Related papers (2022-04-07T22:14:37Z) - A causal learning framework for the analysis and interpretation of
COVID-19 clinical data [7.256237785391623]
The workflow consists in a multi-step approach that goes from identifying the main causes of patient's outcome through BSL.
We evaluate our approach on a feature-rich COVID-19 dataset, showing that the proposed framework provides a schematic overview of the multi-factorial processes that jointly contribute to the outcome.
Our approach yields to a highly interpretable tool correctly predicting the outcome of 85% of subjects based exclusively on 3 features.
arXiv Detail & Related papers (2021-05-14T15:58:18Z) - Extracting Lifestyle Factors for Alzheimer's Disease from Clinical Notes
Using Deep Learning with Weak Supervision [9.53786612243512]
The objective of the study was to demonstrate the feasibility of natural language processing (NLP) models to classify lifestyle factors.
We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models.
The proposed approach leveraging weak supervision could significantly increase the sample size.
arXiv Detail & Related papers (2021-01-22T17:55:03Z) - Classification supporting COVID-19 diagnostics based on patient survey
data [82.41449972618423]
logistic regression and XGBoost classifiers, that allow for effective screening of patients for COVID-19 were generated.
The obtained classification models provided the basis for the DECODE service (decode.polsl.pl), which can serve as support in screening patients with COVID-19 disease.
This data set consists of more than 3,000 examples is based on questionnaires collected at a hospital in Poland.
arXiv Detail & Related papers (2020-11-24T17:44:01Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.