Related papers: Evaluating LLMs for Anxiety, Depression, and Stress Detection Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data

Evaluating LLMs for Anxiety, Depression, and Stress Detection Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data

URL: http://arxiv.org/abs/2511.07044v1
Date: Mon, 10 Nov 2025 12:38:26 GMT
Title: Evaluating LLMs for Anxiety, Depression, and Stress Detection Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data
Authors: Mihael Arcan, David-Paul Niland,
Abstract summary: Mental health disorders affect over one-fifth of adults globally.<n>Using the DAIC-WOZ dataset of clinical interviews, we fine-tuned models for anxiety, depression, and stress classification.<n>For stress detection, a zero-shot synthetic approach reached an F1 of 0.884 and ROC AUC of 0.886.
Score: 1.5930654066091687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mental health disorders affect over one-fifth of adults globally, yet detecting such conditions from text remains challenging due to the subtle and varied nature of symptom expression. This study evaluates multiple approaches for mental health detection, comparing Large Language Models (LLMs) such as Llama and GPT with classical machine learning and transformer-based architectures including BERT, XLNet, and Distil-RoBERTa. Using the DAIC-WOZ dataset of clinical interviews, we fine-tuned models for anxiety, depression, and stress classification and applied synthetic data generation to mitigate class imbalance. Results show that Distil-RoBERTa achieved the highest F1 score (0.883) for GAD-2, while XLNet outperformed others on PHQ tasks (F1 up to 0.891). For stress detection, a zero-shot synthetic approach (SD+Zero-Shot-Basic) reached an F1 of 0.884 and ROC AUC of 0.886. Findings demonstrate the effectiveness of transformer-based models and highlight the value of synthetic data in improving recall and generalization. However, careful calibration is required to prevent precision loss. Overall, this work emphasizes the potential of combining advanced language models and data augmentation to enhance automated mental health assessment from text.

Related papers

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection [54.209716321122194]
We present DepFlow, a depression-conditioned text-to-speech framework.<n>A Depression Acoustic Camouflage learns speaker- and content-invariant depression embeddings through adversarial training.<n>A flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity.<n>A prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum.
arXiv Detail & Related papers (2026-01-01T10:44:38Z)
Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects [42.465094107111646]
This study evaluates the efficacy of vision transformer models, specifically Swin transformers, in enhancing the diagnostic accuracy of ear diseases.<n>The research utilised a real-world dataset from the Department of Otolaryngology at the Clinical Hospital of the Universidad de Chile.
arXiv Detail & Related papers (2025-11-06T23:20:37Z)
Mental Multi-class Classification on Social Media: Benchmarking Transformer Architectures against LSTM Models [7.464241214592479]
We present a large-scale comparative study of state-of-the-art transformer versus Long Short-Term Memory (LSTM)-based models to classify mental health posts.<n>We first curate a large dataset of Reddit posts spanning six mental health conditions and a control group, using rigorous filtering and statistical exploratory analysis to ensure annotation quality.<n> Experimental results show that transformer models consistently outperform the alternatives, with RoBERTa achieving 91-99% F1-scores and accuracies across all classes.
arXiv Detail & Related papers (2025-09-20T05:41:59Z)
LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data [32.69241041313969]
Alzheimer's disease and related dementias affect nearly five million older adults in the United States.<n>This study develops and evaluates a speech-based screening pipeline integrating transformer embeddings with handcrafted linguistic features.
arXiv Detail & Related papers (2025-08-08T13:44:55Z)
Advancing Mental Disorder Detection: A Comparative Evaluation of Transformer and LSTM Architectures on Social Media [0.16385815610837165]
This study provides a comprehensive evaluation of state-of-the-art transformer models against Long Short-Term Memory (LSTM) based approaches.<n>We construct a large annotated dataset using different text embedding techniques for mental health disorder classification on Reddit.<n> Experimental results demonstrate the superior performance of transformer models over traditional deep-learning approaches.
arXiv Detail & Related papers (2025-07-17T04:58:31Z)
Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging [40.35825564674249]
This study introduces the first structured benchmark to assess the robustness and efficiency of transfer learning strategies for Foundation Models.<n>Four publicly available COVID-19 chest X-ray datasets were used, covering mortality, severity, and admission.<n>CNNs pretrained on ImageNet and FMs pretrained on general or biomedical datasets were adapted using full finetuning, linear probing, and parameter-efficient methods.
arXiv Detail & Related papers (2025-06-23T09:16:04Z)
Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models [6.916082619621498]
Post-Traumatic Stress Disorder (PTSD) remains underdiagnosed in clinical settings.<n>This study evaluates natural language processing approaches for detecting PTSD from clinical interview transcripts.
arXiv Detail & Related papers (2025-04-01T22:06:28Z)
Enhanced Large Language Models for Effective Screening of Depression and Anxiety [44.81045754697482]
This paper introduces a pipeline for synthesizing clinical interviews, resulting in 1,157 interactive dialogues (PsyInterview)<n>EmoScan distinguishes between coarse (e.g., anxiety or depressive disorders) and fine disorders (e.g., major depressive disorders) and conducts high-quality interviews.
arXiv Detail & Related papers (2025-01-15T12:42:09Z)
Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z)
A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers [0.0]
Dysarthria is a speech disorder that hinders communication due to difficulties in articulating words. Much of the literature focused on improving ASR systems for dysarthric speech. This work aims to develop models that can accurately classify the presence of dysarthria.
arXiv Detail & Related papers (2023-09-17T17:23:41Z)
Large Language Models to Identify Social Determinants of Health in Electronic Health Records [2.168737004368243]
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHRs) This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated.
arXiv Detail & Related papers (2023-08-11T19:18:35Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.