Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from
Social Media
- URL: http://arxiv.org/abs/2307.02313v2
- Date: Thu, 6 Jul 2023 11:08:51 GMT
- Title: Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from
Social Media
- Authors: Ana-Maria Bucur
- Abstract summary: We present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression.
The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire.
Our results show that using sentence embeddings from a model designed for semantic search outperforms the approach using embeddings from a model pre-trained on mental health data.
- Score: 7.868449549351487
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we present the contribution of the BLUE team in the eRisk Lab
task on searching for symptoms of depression. The task consists of retrieving
and ranking Reddit social media sentences that convey symptoms of depression
from the BDI-II questionnaire. Given that synthetic data provided by LLMs have
been proven to be a reliable method for augmenting data and fine-tuning
downstream models, we chose to generate synthetic data using ChatGPT for each
of the symptoms of the BDI-II questionnaire. We designed a prompt such that the
generated data contains more richness and semantic diversity than the BDI-II
responses for each question and, at the same time, contains emotional and
anecdotal experiences that are specific to the more intimate way of sharing
experiences on Reddit. We perform semantic search and rank the sentences'
relevance to the BDI-II symptoms by cosine similarity. We used two
state-of-the-art transformer-based models (MentalRoBERTa and a variant of
MPNet) for embedding the social media posts, the original and generated
responses of the BDI-II. Our results show that using sentence embeddings from a
model designed for semantic search outperforms the approach using embeddings
from a model pre-trained on mental health data. Furthermore, the generated
synthetic data were proved too specific for this task, the approach simply
relying on the BDI-II responses had the best performance.
Related papers
- INESC-ID @ eRisk 2025: Exploring Fine-Tuned, Similarity-Based, and Prompt-Based Approaches to Depression Symptom Identification [0.0]
We describe our team's approach to eRisk's 2025 Task 1: Search for Symptoms of Depression.<n>Given a set of sentences, participants were tasked with submitting up to 1,000 sentences per depression symptom.<n>Training data consisted of sentences labeled as to whether a given sentence was relevant or not.<n>We explored foundation model fine-tuning, sentence similarity, Large Language Model (LLM) prompting, and ensemble techniques.
arXiv Detail & Related papers (2025-06-03T14:25:12Z) - An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval [51.10419281315848]
We conduct an empirical study to explore the potential of synthetic data for Text-Based Person Retrieval (TBPR) research.
We propose an inter-class image generation pipeline, in which an automatic prompt construction strategy is introduced.
We develop an intra-class image augmentation pipeline, in which the generative AI models are applied to further edit the images.
arXiv Detail & Related papers (2025-03-28T06:18:15Z) - Synthetic Data Generation with LLM for Improved Depression Prediction [5.508617844957542]
We propose a pipeline for Large Language Models to generate synthetic data to improve the performance of depression prediction models.
Not only was the synthetic data satisfactory in terms of fidelity and privacy-preserving metrics, it also balanced the distribution of severity in the training dataset.
arXiv Detail & Related papers (2024-11-26T18:31:14Z) - Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection [0.0]
WHO revealed approximately 280 million people in the world suffer from depression.
Our paper examined the performance of several ML algorithms for early-stage depression detection using two benchmark social media datasets.
arXiv Detail & Related papers (2024-09-07T07:47:55Z) - Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings [0.0]
This study introduces a well-grounded approach to identify depressive social media posts in Bangla.
The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts.
To address the issue of class imbalance, we utilised random oversampling for the minority class.
arXiv Detail & Related papers (2024-07-12T11:40:17Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - On Synthetic Data for Back Translation [66.6342561585953]
Back translation (BT) is one of the most significant technologies in NMT research fields.
We identify two key factors on synthetic data controlling the back-translation NMT performance, which are quality and importance.
We propose a simple yet effective method to generate synthetic data to better trade off both factors so as to yield a better performance for BT.
arXiv Detail & Related papers (2023-10-20T17:24:12Z) - KESDT: knowledge enhanced shallow and deep Transformer for detecting
adverse drug reactions [14.095117843726511]
We propose the Knowledge Enhanced Shallow and Deep Transformer(KESDT) model for ADR detection.
To cope with the first issue, we incorporate the domain keywords into the Transformer model through a shallow fusion manner.
To overcome the low annotated data, we integrate the synonym sets into the Transformer model through a deep fusion manner.
arXiv Detail & Related papers (2023-08-18T06:10:11Z) - Depression detection in social media posts using affective and social
norm features [84.12658971655253]
We propose a deep architecture for depression detection from social media posts.
We incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme.
The inclusion of the proposed features yields state-of-the-art results in both settings.
arXiv Detail & Related papers (2023-03-24T21:26:27Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Semantic Similarity Models for Depression Severity Estimation [53.72188878602294]
This paper presents an efficient semantic pipeline to study depression severity in individuals based on their social media writings.
We use test user sentences for producing semantic rankings over an index of representative training sentences corresponding to depressive symptoms and severity levels.
We evaluate our methods on two Reddit-based benchmarks, achieving 30% improvement over state of the art in terms of measuring depression severity.
arXiv Detail & Related papers (2022-11-14T18:47:26Z) - Depression Symptoms Modelling from Social Media Text: An Active Learning
Approach [1.513693945164213]
We describe an Active Learning framework which uses an initial supervised learning model.
We harvest depression symptoms related samples from our large self-curated Depression Tweets Repository.
We show that we can produce a final dataset which is the largest of its kind.
arXiv Detail & Related papers (2022-09-06T18:41:57Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.