Detecting the Clinical Features of Difficult-to-Treat Depression using
Synthetic Data from Large Language Models
- URL: http://arxiv.org/abs/2402.07645v1
- Date: Mon, 12 Feb 2024 13:34:33 GMT
- Title: Detecting the Clinical Features of Difficult-to-Treat Depression using
Synthetic Data from Large Language Models
- Authors: Isabelle Lorge, Dan W. Joyce, Niall Taylor, Alejo Nevado-Holgado,
Andrea Cipriani, Andrey Kormilitzin
- Abstract summary: We seek to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record data.
We use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model.
We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD
- Score: 0.20971479389679337
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Difficult-to-treat depression (DTD) has been proposed as a broader and more
clinically comprehensive perspective on a person's depressive disorder where
despite treatment, they continue to experience significant burden. We sought to
develop a Large Language Model (LLM)-based tool capable of interrogating
routinely-collected, narrative (free-text) electronic health record (EHR) data
to locate published prognostic factors that capture the clinical syndrome of
DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a
Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction
model. The resulting model is then able to extract and label spans related to a
variety of relevant positive and negative factors in real clinical data (i.e.
spans of text that increase or decrease the likelihood of a patient matching
the DTD syndrome). We show it is possible to obtain good overall performance
(0.70 F1 across polarity) on real clinical data on a set of as many as 20
different factors, and high performance (0.85 F1 with 0.95 precision) on a
subset of important DTD factors such as history of abuse, family history of
affective disorder, illness severity and suicidality by training the model
exclusively on synthetic data. Our results show promise for future healthcare
applications especially in applications where traditionally, highly
confidential medical data and human-expert annotation would normally be
required.
Related papers
- A BERT-Based Summarization approach for depression detection [1.7363112470483526]
Depression is a globally prevalent mental disorder with potentially severe repercussions if not addressed.
Machine learning and artificial intelligence can autonomously detect depression indicators from diverse data sources.
Our study proposes text summarization as a preprocessing technique to diminish the length and intricacies of input texts.
arXiv Detail & Related papers (2024-09-13T02:14:34Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Socially Aware Synthetic Data Generation for Suicidal Ideation Detection
Using Large Language Models [8.832297887534445]
We introduce an innovative strategy that leverages the capabilities of generative AI models to create synthetic data for suicidal ideation detection.
We benchmarked against state-of-the-art NLP classification models, specifically, those centered around the BERT family structures.
Our synthetic data-driven method, informed by social factors, offers consistent F1-scores of 0.82 for both models.
arXiv Detail & Related papers (2024-01-25T18:25:05Z) - From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models [21.427976533706737]
We take a novel approach that leverages large language models to synthesize clinically useful insights from multi-sensor data.
We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data relate to conditions like depression and anxiety.
We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.
arXiv Detail & Related papers (2023-11-21T23:53:27Z) - GDPR Compliant Collection of Therapist-Patient-Dialogues [48.091760741427656]
We elaborate on the challenges we faced in starting our collection of therapist-patient dialogues in a psychiatry clinic under the General Data Privacy Regulation of the European Union.
We give an overview of each step in our procedure and point out the potential pitfalls to motivate further research in this field.
arXiv Detail & Related papers (2022-11-22T15:51:10Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Deep Temporal Modelling of Clinical Depression through Social Media Text [1.513693945164213]
We develop a model to detect user-level clinical depression based on a user's temporal social media posts.
Our model uses a Depression Detection (DSD) classifier, which is trained on the largest existing samples of clinician annotated tweets for clinical depression symptoms.
arXiv Detail & Related papers (2022-10-28T18:31:52Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - HINT: Hierarchical Interaction Network for Trial Outcome Prediction
Leveraging Web Data [56.53715632642495]
Clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment.
In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions.
arXiv Detail & Related papers (2021-02-08T15:09:07Z) - Longitudinal modeling of MS patient trajectories improves predictions of
disability progression [2.117653457384462]
This work addresses the task of optimally extracting information from longitudinal patient data in the real-world setting.
We show that with machine learning methods suited for patient trajectories modeling, we can predict disability progression of patients in a two-year horizon.
Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction.
arXiv Detail & Related papers (2020-11-09T20:48:00Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.