SDOH-NLI: a Dataset for Inferring Social Determinants of Health from
Clinical Notes
- URL: http://arxiv.org/abs/2310.18431v1
- Date: Fri, 27 Oct 2023 19:09:30 GMT
- Title: SDOH-NLI: a Dataset for Inferring Social Determinants of Health from
Clinical Notes
- Authors: Adam D. Lelkes, Eric Loreaux, Tal Schuster, Ming-Jun Chen, Alvin
Rajkomar
- Abstract summary: Social and behavioral determinants of health (SDOH) play a significant role in shaping health outcomes.
Progress on using NLP methods for this task has been hindered by the lack of high-quality publicly available labeled data.
This paper introduces a new dataset, SDOH-NLI, that is based on publicly available notes and which we release publicly.
- Score: 13.991819517682574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social and behavioral determinants of health (SDOH) play a significant role
in shaping health outcomes, and extracting these determinants from clinical
notes is a first step to help healthcare providers systematically identify
opportunities to provide appropriate care and address disparities. Progress on
using NLP methods for this task has been hindered by the lack of high-quality
publicly available labeled data, largely due to the privacy and regulatory
constraints on the use of real patients' information. This paper introduces a
new dataset, SDOH-NLI, that is based on publicly available notes and which we
release publicly. We formulate SDOH extraction as a natural language inference
(NLI) task, and provide binary textual entailment labels obtained from human
raters for a cross product of a set of social history snippets as premises and
SDOH factors as hypotheses. Our dataset differs from standard NLI benchmarks in
that our premises and hypotheses are obtained independently. We evaluate both
"off-the-shelf" entailment models as well as models fine-tuned on our data, and
highlight the ways in which our dataset appears more challenging than commonly
used NLI datasets.
Related papers
- Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction [4.042918413611158]
Social determinants of health (SDOH) play an important role in health outcomes.
Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH.
Large language models (LLMs) have shown promise at automatically annotating structured data.
arXiv Detail & Related papers (2024-07-12T21:14:06Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Your Model Is Not Predicting Depression Well And That Is Why: A Case
Study of PRIMATE Dataset [0.0]
This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts.
Our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom.
Our refined annotations, to be released under a Data Use Agreement, offer a higher-quality test set for anhedonia detection.
arXiv Detail & Related papers (2024-03-01T10:47:02Z) - Evaluating the Impact of Social Determinants on Health Prediction in the
Intensive Care Unit [10.764842579064636]
Social determinants of health (SDOH) play a crucial role in a person's health and well-being.
Most risk prediction models based on electronic health records do not incorporate a comprehensive set of SDOH features.
Our work links a publicly available EHR database, MIMIC-IV, to well-documented SDOH features.
arXiv Detail & Related papers (2023-05-22T01:27:51Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Leveraging Natural Language Processing to Augment Structured Social
Determinants of Health Data in the Electronic Health Record [1.7812428873698403]
Social determinants of health (SDOH) impact health outcomes.
Clinical notes often contain more comprehensive SDOH information.
We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture.
arXiv Detail & Related papers (2022-12-14T22:51:49Z) - Healthsheet: Development of a Transparency Artifact for Health Datasets [13.57051456780329]
We introduce Healthsheet, a contextualized adaptation of the original questionnaire citegebru 2018datasheets for health-specific applications.
We work with three publicly-available healthcare datasets as our case studies.
arXiv Detail & Related papers (2022-02-26T01:05:55Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks.
Data augmentation methods have been explored as a means of improving data efficiency in NLP.
We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.