SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)
- URL: http://arxiv.org/abs/2407.17126v1
- Date: Wed, 24 Jul 2024 09:57:51 GMT
- Title: SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)
- Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding,
- Abstract summary: We introduce SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method to extract social determinants of health from medical notes.
It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92.
This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.
- Score: 43.79125048893811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy. This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.
Related papers
- Weakly Supervised Intracranial Hemorrhage Segmentation with YOLO and an Uncertainty Rectified Segment Anything Model [0.5578116134031106]
Intracranial hemorrhage (ICH) is a life-threatening condition that requires rapid and accurate diagnosis to improve treatment outcomes and patient survival rates.
Recent advancements in supervised deep learning have greatly improved the analysis of medical images.
To mitigate the need for large amounts of expert-prepared segmentation data, we have developed a novel weakly supervised ICH segmentation method.
arXiv Detail & Related papers (2024-07-29T23:40:13Z) - Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods [17.83326146480516]
Social determinants of health (SDoH) play a critical role in shaping health outcomes.
We present a novel annotated corpus, the Pediatric Social History Corpus (PedSHAC)
We evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods.
arXiv Detail & Related papers (2024-03-31T23:37:18Z) - ALPHA: AnomaLous Physiological Health Assessment Using Large Language
Models [4.247764575421617]
Large Language Models (LLMs) exhibit exceptional performance in determining medical indicators.
Our specially adapted GPT models demonstrated remarkable proficiency, achieving less than 1 bpm error in cycle count.
This study highlights LLMs' dual role as health data analysis tools and pivotal elements in advanced AI health assistants.
arXiv Detail & Related papers (2023-11-21T11:09:57Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - A Marker-based Neural Network System for Extracting Social Determinants
of Health [12.6970199179668]
Social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known.
Many SDoH items are not coded in structured forms in electronic health records.
We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically.
arXiv Detail & Related papers (2022-12-24T18:40:23Z) - Predicting Patient Readmission Risk from Medical Text via Knowledge
Graph Enhanced Multiview Graph Convolution [67.72545656557858]
We propose a new method that uses medical text of Electronic Health Records for prediction.
We represent discharge summaries of patients with multiview graphs enhanced by an external knowledge graph.
Experimental results prove the effectiveness of our method, yielding state-of-the-art performance.
arXiv Detail & Related papers (2021-12-19T01:45:57Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Automated Quantification of CT Patterns Associated with COVID-19 from
Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions.
The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities.
Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.