Related papers: INESC-ID @ eRisk 2025: Exploring Fine-Tuned, Similarity-Based, and Prompt-Based Approaches to Depression Symptom Identification

INESC-ID @ eRisk 2025: Exploring Fine-Tuned, Similarity-Based, and Prompt-Based Approaches to Depression Symptom Identification

URL: http://arxiv.org/abs/2506.02924v1
Date: Tue, 03 Jun 2025 14:25:12 GMT
Title: INESC-ID @ eRisk 2025: Exploring Fine-Tuned, Similarity-Based, and Prompt-Based Approaches to Depression Symptom Identification
Authors: Diogo A. P. Nunes, Eugénio Ribeiro,
Abstract summary: We describe our team's approach to eRisk's 2025 Task 1: Search for Symptoms of Depression.<n>Given a set of sentences, participants were tasked with submitting up to 1,000 sentences per depression symptom.<n>Training data consisted of sentences labeled as to whether a given sentence was relevant or not.<n>We explored foundation model fine-tuning, sentence similarity, Large Language Model (LLM) prompting, and ensemble techniques.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this work, we describe our team's approach to eRisk's 2025 Task 1: Search for Symptoms of Depression. Given a set of sentences and the Beck's Depression Inventory - II (BDI) questionnaire, participants were tasked with submitting up to 1,000 sentences per depression symptom in the BDI, sorted by relevance. Participant submissions were evaluated according to standard Information Retrieval (IR) metrics, including Average Precision (AP) and R-Precision (R-PREC). The provided training data, however, consisted of sentences labeled as to whether a given sentence was relevant or not w.r.t. one of BDI's symptoms. Due to this labeling limitation, we framed our development as a binary classification task for each BDI symptom, and evaluated accordingly. To that end, we split the available labeled data into training and validation sets, and explored foundation model fine-tuning, sentence similarity, Large Language Model (LLM) prompting, and ensemble techniques. The validation results revealed that fine-tuning foundation models yielded the best performance, particularly when enhanced with synthetic data to mitigate class imbalance. We also observed that the optimal approach varied by symptom. Based on these insights, we devised five independent test runs, two of which used ensemble methods. These runs achieved the highest scores in the official IR evaluation, outperforming submissions from 16 other teams.

Related papers

A Gold Standard Dataset and Evaluation Framework for Depression Detection and Explanation in Social Media using LLMs [0.0]
Early detection of depression from online social media posts holds promise for providing timely mental health interventions.<n>We present a high-quality, expert-annotated dataset of 1,017 social media posts labeled with depressive spans and mapped to 12 depression symptom categories.
arXiv Detail & Related papers (2025-07-26T10:01:55Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection [0.0]
WHO revealed approximately 280 million people in the world suffer from depression. Our paper examined the performance of several ML algorithms for early-stage depression detection using two benchmark social media datasets.
arXiv Detail & Related papers (2024-09-07T07:47:55Z)
Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis. Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria. To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z)
DepreSym: A Depression Symptom Annotated Corpus and the Role of LLMs as Assessors of Psychological Markers [3.5511184956329727]
We present the DepreSym dataset, consisting of 21580 sentences annotated according to their relevance to the Beck Depression Inventory-II symptoms. This dataset serves as a valuable resource for advancing the development of models that incorporate depressive markers such as clinical symptoms.
arXiv Detail & Related papers (2023-08-21T14:44:31Z)
Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media [7.868449549351487]
We present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression. The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire. Our results show that using sentence embeddings from a model designed for semantic search outperforms the approach using embeddings from a model pre-trained on mental health data.
arXiv Detail & Related papers (2023-07-05T14:15:15Z)
Semantic Similarity Models for Depression Severity Estimation [53.72188878602294]
This paper presents an efficient semantic pipeline to study depression severity in individuals based on their social media writings. We use test user sentences for producing semantic rankings over an index of representative training sentences corresponding to depressive symptoms and severity levels. We evaluate our methods on two Reddit-based benchmarks, achieving 30% improvement over state of the art in terms of measuring depression severity.
arXiv Detail & Related papers (2022-11-14T18:47:26Z)
Depression Symptoms Modelling from Social Media Text: An Active Learning Approach [1.513693945164213]
We describe an Active Learning framework which uses an initial supervised learning model. We harvest depression symptoms related samples from our large self-curated Depression Tweets Repository. We show that we can produce a final dataset which is the largest of its kind.
arXiv Detail & Related papers (2022-09-06T18:41:57Z)
Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system. Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model. We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis [142.90770786804507]
Medical diagnosis assistant (MDA) aims to build an interactive diagnostic agent to sequentially inquire about symptoms for discriminating diseases. This work attempts to address these critical issues in MDA by taking advantage of the causal diagram. We propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records.
arXiv Detail & Related papers (2020-03-14T02:05:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.