Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums
- URL: http://arxiv.org/abs/2404.16461v2
- Date: Fri, 26 Apr 2024 11:36:28 GMT
- Title: Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums
- Authors: Isabelle Lorge, Dan W. Joyce, Andrey Kormilitzin,
- Abstract summary: Mental health in children and adolescents has been steadily deteriorating over the past few years.
We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT.
We create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it.
We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mental health in children and adolescents has been steadily deteriorating over the past few years. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage.
Related papers
- Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks [0.7071166713283337]
We created datasets large enough to train machine learning models.
Our goal is to label behaviors corresponding to autism criteria.
Augmenting data increased recall by 13% but decreased precision by 16%.
arXiv Detail & Related papers (2024-05-08T03:18:12Z) - Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study [61.74571814707054]
We evaluate whether every generated sentence is grounded in retrieved documents or the model's pre-training data.
Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded.
Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations.
arXiv Detail & Related papers (2024-04-10T14:50:10Z) - Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data [27.13970925299262]
We develop HEADROOM, a synthetic dataset of 3,120 posts about depression-triggering stressors.
We conduct semantic and lexical analyses to identify the predominant stressors for each demographic group.
We present the procedures to generate queries to develop depression data using GPT-3, and conduct analyzes to uncover the types of stressors it assigns to demographic groups.
arXiv Detail & Related papers (2024-03-25T16:21:25Z) - ChatGPT Based Data Augmentation for Improved Parameter-Efficient
Debiasing of LLMs [69.27030571729392]
Large Language models (LLMs) exhibit harmful social biases.
This work introduces a novel approach utilizing ChatGPT to generate synthetic training data.
arXiv Detail & Related papers (2024-02-19T01:28:48Z) - Detecting the Clinical Features of Difficult-to-Treat Depression using
Synthetic Data from Large Language Models [0.20971479389679337]
We seek to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record data.
We use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model.
We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD
arXiv Detail & Related papers (2024-02-12T13:34:33Z) - Discovery of the Hidden World with Large Language Models [100.38157787218044]
We introduce COAT: Causal representatiOn AssistanT.
COAT incorporates LLMs as a factor proposer that extracts the potential causal factors from unstructured data.
LLMs can also be instructed to provide additional information used to collect data values.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - Synthetic Data Generation with Large Language Models for Text
Classification: Potential and Limitations [21.583825474908334]
We study how the performance of models trained on synthetic data may vary with the subjectivity of classification.
Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data.
arXiv Detail & Related papers (2023-10-11T19:51:13Z) - The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in
Classification Tasks [0.0]
We compare the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks.
Our findings reveal that models trained on human-labeled data consistently exhibit superior or comparable performance compared to their synthetically augmented counterparts.
arXiv Detail & Related papers (2023-04-26T23:09:02Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - DeepRite: Deep Recurrent Inverse TreatmEnt Weighting for Adjusting
Time-varying Confounding in Modern Longitudinal Observational Data [68.29870617697532]
We propose Deep Recurrent Inverse TreatmEnt weighting (DeepRite) for time-varying confounding in longitudinal data.
DeepRite is shown to recover the ground truth from synthetic data, and estimate unbiased treatment effects from real data.
arXiv Detail & Related papers (2020-10-28T15:05:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.