Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods
- URL: http://arxiv.org/abs/2404.00826v2
- Date: Thu, 4 Apr 2024 04:21:34 GMT
- Title: Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods
- Authors: Yujuan Fu, Giridhar Kaushik Ramachandran, Nicholas J Dobbins, Namu Park, Michael Leu, Abby R. Rosenberg, Kevin Lybarger, Fei Xia, Ozlem Uzuner, Meliha Yetisgen,
- Abstract summary: Social determinants of health (SDoH) play a critical role in shaping health outcomes.
We present a novel annotated corpus, the Pediatric Social History Corpus (PedSHAC)
We evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods.
- Score: 17.83326146480516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.
Related papers
- Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction [8.345409741431785]
Role-playing improves F1 scores in 95% of cases across 14 different socio-demographic backgrounds.
Applying role-playing with in-context learning outperformed the previous state-of-the-art models.
arXiv Detail & Related papers (2024-08-10T13:40:44Z) - SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH) [43.79125048893811]
We introduce SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method to extract social determinants of health from medical notes.
It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92.
This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.
arXiv Detail & Related papers (2024-07-24T09:57:51Z) - Prompt-based Extraction of Social Determinants of Health Using Few-shot
Learning [3.418600863629033]
Social determinants of health (SDOH) documented in the electronic health record are being studied to understand how SDOH impacts patient health outcomes.
In this work, we utilize the Social History Corpus (SHAC), a multi-institutional corpus of de-identified social history sections annotated for SDOH, including substance use, employment, and living status information.
We explore the automatic extraction of SDOH information with SHAC in both standoff and inline annotation formats using GPT-4 in a one-shot prompting setting.
Our prompt-based GPT-4 method achieved an overall 0.652 F1 on the SHAC test set,
arXiv Detail & Related papers (2023-06-12T15:08:25Z) - A Marker-based Neural Network System for Extracting Social Determinants
of Health [12.6970199179668]
Social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known.
Many SDoH items are not coded in structured forms in electronic health records.
We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically.
arXiv Detail & Related papers (2022-12-24T18:40:23Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Annotating Social Determinants of Health Using Active Learning, and
Characterizing Determinants Using Neural Event Extraction [11.845850292404768]
Social determinants of health (SDOH) affect health outcomes, and knowledge of SDOH can inform clinical decision-making.
This work presents a new corpus with SDOH annotations, a novel active learning framework, and the first extraction results on the new corpus.
arXiv Detail & Related papers (2020-04-11T16:19:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.