Mental Multi-class Classification on Social Media: Benchmarking Transformer Architectures against LSTM Models
- URL: http://arxiv.org/abs/2509.16542v1
- Date: Sat, 20 Sep 2025 05:41:59 GMT
- Title: Mental Multi-class Classification on Social Media: Benchmarking Transformer Architectures against LSTM Models
- Authors: Khalid Hasan, Jamil Saquer, Yifan Zhang,
- Abstract summary: We present a large-scale comparative study of state-of-the-art transformer versus Long Short-Term Memory (LSTM)-based models to classify mental health posts.<n>We first curate a large dataset of Reddit posts spanning six mental health conditions and a control group, using rigorous filtering and statistical exploratory analysis to ensure annotation quality.<n> Experimental results show that transformer models consistently outperform the alternatives, with RoBERTa achieving 91-99% F1-scores and accuracies across all classes.
- Score: 7.464241214592479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Millions of people openly share mental health struggles on social media, providing rich data for early detection of conditions such as depression, bipolar disorder, etc. However, most prior Natural Language Processing (NLP) research has focused on single-disorder identification, leaving a gap in understanding the efficacy of advanced NLP techniques for distinguishing among multiple mental health conditions. In this work, we present a large-scale comparative study of state-of-the-art transformer versus Long Short-Term Memory (LSTM)-based models to classify mental health posts into exclusive categories of mental health conditions. We first curate a large dataset of Reddit posts spanning six mental health conditions and a control group, using rigorous filtering and statistical exploratory analysis to ensure annotation quality. We then evaluate five transformer architectures (BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA) against several LSTM variants (with or without attention, using contextual or static embeddings) under identical conditions. Experimental results show that transformer models consistently outperform the alternatives, with RoBERTa achieving 91-99% F1-scores and accuracies across all classes. Notably, attention-augmented LSTMs with BERT embeddings approach transformer performance (up to 97% F1-score) while training 2-3.5 times faster, whereas LSTMs using static embeddings fail to learn useful signals. These findings represent the first comprehensive benchmark for multi-class mental health detection, offering practical guidance on model selection and highlighting an accuracy-efficiency trade-off for real-world deployment of mental health NLP systems.
Related papers
- A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Evaluating LLMs for Anxiety, Depression, and Stress Detection Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data [1.5930654066091687]
Mental health disorders affect over one-fifth of adults globally.<n>Using the DAIC-WOZ dataset of clinical interviews, we fine-tuned models for anxiety, depression, and stress classification.<n>For stress detection, a zero-shot synthetic approach reached an F1 of 0.884 and ROC AUC of 0.886.
arXiv Detail & Related papers (2025-11-10T12:38:26Z) - multiMentalRoBERTa: A Fine-tuned Multiclass Classifier for Mental Health Disorder [0.6308539010172308]
The early detection of mental health disorders from social media text is critical for enabling timely support, risk assessment, and referral to appropriate resources.<n>This work introduces multiMentalRoBERTa, a fine-tuned RoBERTa model designed for multiclass classification of common mental health conditions.
arXiv Detail & Related papers (2025-11-01T03:55:48Z) - Beyond Architectures: Evaluating the Role of Contextual Embeddings in Detecting Bipolar Disorder on Social Media [0.18416014644193066]
bipolar disorder is a chronic mental illness frequently underdiagnosed due to subtle early symptoms and social stigma.<n>This paper explores the advanced natural language processing (NLP) models for recognizing signs of bipolar disorder based on user-generated social media text.
arXiv Detail & Related papers (2025-07-17T05:14:19Z) - Advancing Mental Disorder Detection: A Comparative Evaluation of Transformer and LSTM Architectures on Social Media [0.16385815610837165]
This study provides a comprehensive evaluation of state-of-the-art transformer models against Long Short-Term Memory (LSTM) based approaches.<n>We construct a large annotated dataset using different text embedding techniques for mental health disorder classification on Reddit.<n> Experimental results demonstrate the superior performance of transformer models over traditional deep-learning approaches.
arXiv Detail & Related papers (2025-07-17T04:58:31Z) - MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis [58.67342568632529]
MoodAngels is the first specialized multi-agent framework for mood disorder diagnosis.<n>MoodSyn is an open-source dataset of 1,173 synthetic psychiatric cases.
arXiv Detail & Related papers (2025-06-04T09:18:25Z) - Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models [6.916082619621498]
Post-Traumatic Stress Disorder (PTSD) remains underdiagnosed in clinical settings.<n>This study evaluates natural language processing approaches for detecting PTSD from clinical interview transcripts.
arXiv Detail & Related papers (2025-04-01T22:06:28Z) - Cognitive-Mental-LLM: Evaluating Reasoning in Large Language Models for Mental Health Prediction via Online Text [0.0]
This study evaluates structured reasoning techniques-Chain-of-Thought (CoT), Self-Consistency (SC-CoT), and Tree-of-Thought (ToT)-to improve classification accuracy across multiple mental health datasets sourced from Reddit.<n>We analyze reasoning-driven prompting strategies, including Zero-shot CoT and Few-shot CoT, using key performance metrics such as Balanced Accuracy, F1 score, and Sensitivity/Specificity.<n>Our findings indicate that reasoning-enhanced techniques improve classification performance over direct prediction, particularly in complex cases.
arXiv Detail & Related papers (2025-03-13T06:42:37Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions [46.60244609728416]
Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a litmus test of a model's utility in clinical practice.
We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WDs)
We reveal four surprising results about LMs/LLMs.
arXiv Detail & Related papers (2024-06-17T19:50:40Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z) - Performance of Dual-Augmented Lagrangian Method and Common Spatial
Patterns applied in classification of Motor-Imagery BCI [68.8204255655161]
Motor-imagery based brain-computer interfaces (MI-BCI) have the potential to become ground-breaking technologies for neurorehabilitation.
Due to the noisy nature of the used EEG signal, reliable BCI systems require specialized procedures for features optimization and extraction.
arXiv Detail & Related papers (2020-10-13T20:50:13Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.