Related papers: SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

URL: http://arxiv.org/abs/2212.03000v2
Date: Thu, 18 May 2023 18:39:20 GMT
Title: SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies
Authors: Zehao Yu, Xi Yang, Chong Dang, Prakash Adekkanattu, Braja Gopal Patra, Yifan Peng, Jyotishman Pathak, Debbie L. Wilson, Ching-Yuan Chang, Wei-Hsuan Lo-Ciganic, Thomas J. George, William R. Hogan, Yi Guo, Jiang Bian, Yonghui Wu
Abstract summary: We aim to develop an open-source package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients. We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance.
Score: 34.24528053846599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and Conclusion: We developed a corpus of 629 cancer patients notes with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH. The Bidirectional Encoder Representations from Transformers (BERT) model achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts. Fine-tuning the NLP models using new annotations from opioid use patients improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH could be extracted from >70% of cancer patients, but 9 SDoH had a low extraction rate (<70% of cancer patients). The SODA package with pre-trained transformer models is publicly available at https://github.com/uf-hobiinformatics-lab/SDoH_SODA.

Related papers

Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Early Lung Cancer Detection [1.5391321019692428]
This research aims to integrate semantic features derived from radiologists' assessments of nodules, allowing the model to learn clinically relevant, robust, and explainable features for predicting lung cancer. We finetuned a pretrained Contrastive Language-Image Pretraining model with a parameter-efficient fine-tuning approach to align imaging and semantic features and predict the one-year lung cancer diagnosis. Our model demonstrated an AUROC of 0.90 and AUPRC of 0.78, outperforming baseline state-of-the-art models on external datasets.
arXiv Detail & Related papers (2025-04-30T06:11:34Z)
Multi-modal AI for comprehensive breast cancer prognostication [18.691704371847855]
We developed a test for breast cancer patient stratification based on digital pathology and clinical characteristics using novel AI methods. The test was developed and evaluated using data from a total of 8,161 breast cancer patients across 15 cohorts. Results suggest that our AI test can improve accuracy, extend applicability to a wider range of patients, and enhance access to treatment selection tools.
arXiv Detail & Related papers (2024-10-28T17:54:29Z)
Improving Fairness of Automated Chest X-ray Diagnosis by Contrastive Learning [19.948079693716075]
Our proposed AI model utilizes supervised contrastive learning to minimize bias in CXR diagnosis. We evaluated the methods on two datasets: the Medical Imaging and Data Resource Center (MIDRC) dataset with 77,887 CXR images and the NIH Chest X-ray dataset with 112,120 CXR images.
arXiv Detail & Related papers (2024-01-25T20:03:57Z)
Improving Precancerous Case Characterization via Transformer-based Ensemble Learning [31.891340667123124]
The application of natural language processing to cancer pathology reports has been focused on detecting cancer cases. Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention. Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
arXiv Detail & Related papers (2022-12-10T00:06:28Z)
WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels. First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z)
A Deep Learning Based Workflow for Detection of Lung Nodules With Chest Radiograph [0.0]
We built a segmentation model to identify lung areas from CXRs, and sliced them into 16 patches. These labeled patches were then used to train finetune a deep neural network(DNN) model, classifying the patches as positive or negative.
arXiv Detail & Related papers (2021-12-19T16:19:46Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic. The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands. We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes [64.21642241351857]
We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. We developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports. We also developed a model for multi-organ, multi-disease classification of chest CT volumes.
arXiv Detail & Related papers (2020-02-12T00:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.