SODA: A Natural Language Processing Package to Extract Social
Determinants of Health for Cancer Studies
- URL: http://arxiv.org/abs/2212.03000v2
- Date: Thu, 18 May 2023 18:39:20 GMT
- Title: SODA: A Natural Language Processing Package to Extract Social
Determinants of Health for Cancer Studies
- Authors: Zehao Yu, Xi Yang, Chong Dang, Prakash Adekkanattu, Braja Gopal Patra,
Yifan Peng, Jyotishman Pathak, Debbie L. Wilson, Ching-Yuan Chang, Wei-Hsuan
Lo-Ciganic, Thomas J. George, William R. Hogan, Yi Guo, Jiang Bian, Yonghui
Wu
- Abstract summary: We aim to develop an open-source package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients.
We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort.
We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance.
- Score: 34.24528053846599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objective: We aim to develop an open-source natural language processing (NLP)
package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models
to extract social determinants of health (SDoH) for cancer patients, examine
the generalizability of SODA to a new disease domain (i.e., opioid use), and
evaluate the extraction rate of SDoH using cancer populations.
Methods: We identified SDoH categories and attributes and developed an SDoH
corpus using clinical notes from a general cancer cohort. We compared four
transformer-based NLP models to extract SDoH, examined the generalizability of
NLP models to a cohort of patients prescribed with opioids, and explored
customization strategies to improve performance. We applied the best NLP model
to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804),
and colorectal cancer (n=6,240) cohorts.
Results and Conclusion: We developed a corpus of 629 cancer patients notes
with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH.
The Bidirectional Encoder Representations from Transformers (BERT) model
achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH
concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts.
Fine-tuning the NLP models using new annotations from opioid use patients
improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The
extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH
could be extracted from >70% of cancer patients, but 9 SDoH had a low
extraction rate (<70% of cancer patients). The SODA package with pre-trained
transformer models is publicly available at
https://github.com/uf-hobiinformatics-lab/SDoH_SODA.
Related papers
- Multi-modal AI for comprehensive breast cancer prognostication [18.691704371847855]
We developed a test for breast cancer patient stratification based on digital pathology and clinical characteristics using novel AI methods.
The test was developed and evaluated using data from a total of 8,161 breast cancer patients across 15 cohorts.
Results suggest that our AI test can improve accuracy, extend applicability to a wider range of patients, and enhance access to treatment selection tools.
arXiv Detail & Related papers (2024-10-28T17:54:29Z) - Improving Fairness of Automated Chest X-ray Diagnosis by Contrastive
Learning [19.948079693716075]
Our proposed AI model utilizes supervised contrastive learning to minimize bias in CXR diagnosis.
We evaluated the methods on two datasets: the Medical Imaging and Data Resource Center (MIDRC) dataset with 77,887 CXR images and the NIH Chest X-ray dataset with 112,120 CXR images.
arXiv Detail & Related papers (2024-01-25T20:03:57Z) - Improving Precancerous Case Characterization via Transformer-based
Ensemble Learning [31.891340667123124]
The application of natural language processing to cancer pathology reports has been focused on detecting cancer cases.
Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention.
Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
arXiv Detail & Related papers (2022-12-10T00:06:28Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - A Deep Learning Based Workflow for Detection of Lung Nodules With Chest
Radiograph [0.0]
We built a segmentation model to identify lung areas from CXRs, and sliced them into 16 patches.
These labeled patches were then used to train finetune a deep neural network(DNN) model, classifying the patches as positive or negative.
arXiv Detail & Related papers (2021-12-19T16:19:46Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z) - Automated Quantification of CT Patterns Associated with COVID-19 from
Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions.
The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities.
Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z) - Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale
Chest Computed Tomography Volumes [64.21642241351857]
We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients.
We developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports.
We also developed a model for multi-organ, multi-disease classification of chest CT volumes.
arXiv Detail & Related papers (2020-02-12T00:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.