Leveraging Large Language Models for Cost-Effective, Multilingual Depression Detection and Severity Assessment
- URL: http://arxiv.org/abs/2504.04891v1
- Date: Mon, 07 Apr 2025 09:58:19 GMT
- Title: Leveraging Large Language Models for Cost-Effective, Multilingual Depression Detection and Severity Assessment
- Authors: Longdi Xian, Jianzhang Ni, Mingzhu Wang,
- Abstract summary: DeepSeek V3 is the most reliable and cost-effective model for depression detection.<n>The model maintains stably high AUCs for detecting depression in complex diagnostic scenarios.
- Score: 0.7373617024876725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depression is a prevalent mental health disorder that is difficult to detect early due to subjective symptom assessments. Recent advancements in large language models have offered efficient and cost-effective approaches for this objective. In this study, we evaluated the performance of four LLMs in depression detection using clinical interview data. We selected the best performing model and further tested it in the severity evaluation scenario and knowledge enhanced scenario. The robustness was evaluated in complex diagnostic scenarios using a dataset comprising 51074 statements from six different mental disorders. We found that DeepSeek V3 is the most reliable and cost-effective model for depression detection, performing well in both zero-shot and few-shot scenarios, with zero-shot being the most efficient choice. The evaluation of severity showed low agreement with the human evaluator, particularly for mild depression. The model maintains stably high AUCs for detecting depression in complex diagnostic scenarios. These findings highlight DeepSeek V3s strong potential for text-based depression detection in real-world clinical applications. However, they also underscore the need for further refinement in severity assessment and the mitigation of potential biases to enhance clinical reliability.
Related papers
- Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis.
We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis.
We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z) - Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models [9.43184936918456]
Depression is one of the leading causes of disability worldwide.<n>Recent advancements in Large Language Models have shown promise in addressing mental health challenges.<n>We propose a Chain-of-Thought Prompting approach that enhances both the performance and interpretability of depression detection.
arXiv Detail & Related papers (2025-02-09T12:30:57Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - GPT-4 on Clinic Depression Assessment: An LLM-Based Pilot Study [0.6999740786886538]
We explore the use of GPT-4 for clinical depression assessment based on transcript analysis.<n>We examine the model's ability to classify patient interviews into binary categories: depressed and not depressed.<n>Results indicate that GPT-4 exhibits considerable variability in accuracy and F1-Score across configurations.
arXiv Detail & Related papers (2024-12-31T00:32:43Z) - A BERT-Based Summarization approach for depression detection [1.7363112470483526]
Depression is a globally prevalent mental disorder with potentially severe repercussions if not addressed.
Machine learning and artificial intelligence can autonomously detect depression indicators from diverse data sources.
Our study proposes text summarization as a preprocessing technique to diminish the length and intricacies of input texts.
arXiv Detail & Related papers (2024-09-13T02:14:34Z) - Assessing ML Classification Algorithms and NLP Techniques for Depression Detection: An Experimental Case Study [0.6524460254566905]
Depression has affected millions of people worldwide and has become one of the most common mental disorders.
Recent research has evidenced that machine learning (ML) and Natural Language Processing (NLP) tools and techniques have significantly been used to diagnose depression.
However, there are still several challenges in the assessment of depression detection approaches in which other conditions such as post-traumatic stress disorder (PTSD) are present.
arXiv Detail & Related papers (2024-04-03T19:45:40Z) - Depression Detection on Social Media with Large Language Models [23.075317886505193]
Depression detection aims to determine whether an individual suffers from depression by analyzing their history of posts on social media.
We propose a novel depression detection system called DORIS, combining medical knowledge and the recent advances in large language models.
arXiv Detail & Related papers (2024-03-16T01:01:16Z) - Empowering Psychotherapy with Large Language Models: Cognitive
Distortion Detection through Diagnosis of Thought Prompting [82.64015366154884]
We study the task of cognitive distortion detection and propose the Diagnosis of Thought (DoT) prompting.
DoT performs diagnosis on the patient's speech via three stages: subjectivity assessment to separate the facts and the thoughts; contrastive reasoning to elicit the reasoning processes supporting and contradicting the thoughts; and schema analysis to summarize the cognition schemas.
Experiments demonstrate that DoT obtains significant improvements over ChatGPT for cognitive distortion detection, while generating high-quality rationales approved by human experts.
arXiv Detail & Related papers (2023-10-11T02:47:21Z) - Deep Multi-task Learning for Depression Detection and Prediction in
Longitudinal Data [50.02223091927777]
Depression is among the most prevalent mental disorders, affecting millions of people of all ages globally.
Machine learning techniques have shown effective in enabling automated detection and prediction of depression for early intervention and treatment.
We introduce a novel deep multi-task recurrent neural network to tackle this challenge, in which depression classification is jointly optimized with two auxiliary tasks.
arXiv Detail & Related papers (2020-12-05T05:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.