Related papers: Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis

Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis

URL: http://arxiv.org/abs/2410.03908v1
Date: Fri, 4 Oct 2024 20:24:11 GMT
Title: Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis
Authors: Amey Hengle, Atharva Kulkarni, Shantanu Patankar, Madhumitha Chandrasekaran, Sneha D'Silva, Jemima Jacob, Rashmi Gupta,
Abstract summary: We introduce AN GST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. We benchmark AN GST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. While GPT-4 generally outperforms other models, none achieve an F1 score exceeding 72% in multi-class comorbid classification.
Score: 9.738105623317601
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this study, we introduce ANGST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified as indicating depression and/or anxiety. Comprising 2876 meticulously annotated posts by expert psychologists and an additional 7667 silver-labeled posts, ANGST posits a more representative sample of online mental health discourse. Moreover, we benchmark ANGST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. Our results provide significant insights into the capabilities and limitations of these models in complex diagnostic scenarios. While GPT-4 generally outperforms other models, none achieve an F1 score exceeding 72% in multi-class comorbid classification, underscoring the ongoing challenges in applying language models to mental health diagnostics.

Related papers

LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment. We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world. Privacy concerns limit the accessibility of personalized treatment data. MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z)
Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities [25.305909441170993]
Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide.
arXiv Detail & Related papers (2024-07-08T17:00:51Z)
LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains. The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z)
Assessing ML Classification Algorithms and NLP Techniques for Depression Detection: An Experimental Case Study [0.6524460254566905]
Depression has affected millions of people worldwide and has become one of the most common mental disorders. Recent research has evidenced that machine learning (ML) and Natural Language Processing (NLP) tools and techniques have significantly been used to diagnose depression. However, there are still several challenges in the assessment of depression detection approaches in which other conditions such as post-traumatic stress disorder (PTSD) are present.
arXiv Detail & Related papers (2024-04-03T19:45:40Z)
Mental Health Diagnosis in the Digital Age: Harnessing Sentiment Analysis on Social Media Platforms upon Ultra-Sparse Feature Content [3.6195994708545016]
We propose a novel semantic feature preprocessing technique with a three-folded structure. With enhanced semantic features, we train a machine learning model to predict and classify mental disorders. Our methods, when compared to seven benchmark models, demonstrate significant performance improvements.
arXiv Detail & Related papers (2023-11-09T00:15:06Z)
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z)
Handwriting and Drawing for Depression Detection: A Preliminary Study [53.11777541341063]
Short-term covid effects on mental health were a significant increase in anxiety and depressive symptoms. The aim of this study is to use a new tool, the online handwriting and drawing analysis, to discriminate between healthy individuals and depressed patients.
arXiv Detail & Related papers (2023-02-05T22:33:49Z)
Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting [2.0972270756982536]
Speech patterns have been identified as potential diagnostic markers for neuropsychiatric conditions. We tested the performance of a range of machine learning models and advanced Transformer models on both binary and multiclass classification. Our results indicate that models trained on binary classification may learn to rely on markers of generic differences between clinical and non-clinical populations.
arXiv Detail & Related papers (2023-01-13T08:24:21Z)
Exploring Hybrid and Ensemble Models for Multiclass Prediction of Mental Health Status on Social Media [27.799032561722893]
We report on experiments aimed at predicting six conditions (anxiety, attention deficit hyperactivity disorder, bipolar disorder, post-traumatic stress disorder, depression, and psychological stress) from Reddit social media posts. We explore and compare the performance of hybrid and ensemble models leveraging transformer-based architectures (BERT and RoBERTa) and BiLSTM neural networks trained on within-text distributions of a diverse set of linguistic features. In addition, we conduct feature ablation experiments to investigate which types of features are most indicative of particular mental health conditions.
arXiv Detail & Related papers (2022-12-19T20:31:47Z)
Deep Multi-task Learning for Depression Detection and Prediction in Longitudinal Data [50.02223091927777]
Depression is among the most prevalent mental disorders, affecting millions of people of all ages globally. Machine learning techniques have shown effective in enabling automated detection and prediction of depression for early intervention and treatment. We introduce a novel deep multi-task recurrent neural network to tackle this challenge, in which depression classification is jointly optimized with two auxiliary tasks.
arXiv Detail & Related papers (2020-12-05T05:14:14Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.