ReDSM5: A Reddit Dataset for DSM-5 Depression Detection
- URL: http://arxiv.org/abs/2508.03399v1
- Date: Tue, 05 Aug 2025 12:48:06 GMT
- Title: ReDSM5: A Reddit Dataset for DSM-5 Depression Detection
- Authors: Eliseo Bao, Anxo Pérez, Javier Parapar,
- Abstract summary: Depression is a pervasive mental health condition that affects hundreds of millions of individuals worldwide.<n>ReDSM5 is a novel Reddit corpus comprising 1484 long-form posts, each exhaustively annotated at the sentence level by a licensed psychologist for the nine DSM-5 depression symptoms.<n>We conduct an exploratory analysis of the collection, examining lexical, syntactic, and emotional patterns that characterize symptom expression in social media narratives.
- Score: 2.677715367737641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depression is a pervasive mental health condition that affects hundreds of millions of individuals worldwide, yet many cases remain undiagnosed due to barriers in traditional clinical access and pervasive stigma. Social media platforms, and Reddit in particular, offer rich, user-generated narratives that can reveal early signs of depressive symptomatology. However, existing computational approaches often label entire posts simply as depressed or not depressed, without linking language to specific criteria from the DSM-5, the standard clinical framework for diagnosing depression. This limits both clinical relevance and interpretability. To address this gap, we introduce ReDSM5, a novel Reddit corpus comprising 1484 long-form posts, each exhaustively annotated at the sentence level by a licensed psychologist for the nine DSM-5 depression symptoms. For each label, the annotator also provides a concise clinical rationale grounded in DSM-5 methodology. We conduct an exploratory analysis of the collection, examining lexical, syntactic, and emotional patterns that characterize symptom expression in social media narratives. Compared to prior resources, ReDSM5 uniquely combines symptom-specific supervision with expert explanations, facilitating the development of models that not only detect depression but also generate human-interpretable reasoning. We establish baseline benchmarks for both multi-label symptom classification and explanation generation, providing reference results for future research on detection and interpretability.
Related papers
- MAGI: Multi-Agent Guided Interview for Psychiatric Assessment [50.6150986786028]
We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational navigation.<n>We show that MAGI advances LLM- assisted mental health assessment by combining clinical rigor, conversational adaptability, and explainable reasoning.
arXiv Detail & Related papers (2025-04-25T11:08:27Z) - MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.<n>Privacy concerns limit the accessibility of personalized treatment data.<n>MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z) - Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis [9.738105623317601]
We introduce AN GST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts.
We benchmark AN GST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4.
While GPT-4 generally outperforms other models, none achieve an F1 score exceeding 72% in multi-class comorbid classification.
arXiv Detail & Related papers (2024-10-04T20:24:11Z) - MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents [25.987334407396396]
We design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders.<n>We develop the largest Chinese mental disorders diagnosis dataset MDD-5k.
arXiv Detail & Related papers (2024-08-22T05:59:47Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - What Symptoms and How Long? An Interpretable AI Approach for Depression
Detection in Social Media [0.5156484100374058]
Depression is the most prevalent and serious mental illness, which induces grave financial and societal ramifications.
This study contributes to IS literature with a novel interpretable deep learning model for depression detection in social media.
arXiv Detail & Related papers (2023-05-18T20:15:04Z) - Handwriting and Drawing for Depression Detection: A Preliminary Study [53.11777541341063]
Short-term covid effects on mental health were a significant increase in anxiety and depressive symptoms.
The aim of this study is to use a new tool, the online handwriting and drawing analysis, to discriminate between healthy individuals and depressed patients.
arXiv Detail & Related papers (2023-02-05T22:33:49Z) - Semantic Similarity Models for Depression Severity Estimation [53.72188878602294]
This paper presents an efficient semantic pipeline to study depression severity in individuals based on their social media writings.
We use test user sentences for producing semantic rankings over an index of representative training sentences corresponding to depressive symptoms and severity levels.
We evaluate our methods on two Reddit-based benchmarks, achieving 30% improvement over state of the art in terms of measuring depression severity.
arXiv Detail & Related papers (2022-11-14T18:47:26Z) - DEPTWEET: A Typology for Social Media Texts to Detect Depression
Severities [0.46796109436086664]
We leverage the clinical articulation of depression to build a typology for social media texts for detecting the severity of depression.
It emulates the standard clinical assessment procedure Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and Patient Health Questionnaire (PHQ-9)
We present a new dataset of 40191 tweets labeled by expert annotators. Each tweet is labeled as 'non-depressed' or 'depressed'
arXiv Detail & Related papers (2022-10-10T08:23:57Z) - D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat [25.852922703368133]
In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms.
Due to the social stigma associated with mental illness, the dialogue data related to depression consultation and diagnosis are rarely disclosed.
We construct a Chinese dialogue dataset for Depression-Diagnosis-Oriented Chat which simulates the dialogue between doctors and patients during the diagnosis of depression.
arXiv Detail & Related papers (2022-05-24T03:54:22Z) - Data set creation and empirical analysis for detecting signs of
depression from social media postings [0.0]
Depression is a common mental illness that has to be detected and treated at an early stage to avoid serious consequences.
We developed a gold standard data set that detects the levels of depression as not depressed', moderately depressed' and severely depressed' from the social media postings.
arXiv Detail & Related papers (2022-02-07T10:24:33Z) - Deep Multi-task Learning for Depression Detection and Prediction in
Longitudinal Data [50.02223091927777]
Depression is among the most prevalent mental disorders, affecting millions of people of all ages globally.
Machine learning techniques have shown effective in enabling automated detection and prediction of depression for early intervention and treatment.
We introduce a novel deep multi-task recurrent neural network to tackle this challenge, in which depression classification is jointly optimized with two auxiliary tasks.
arXiv Detail & Related papers (2020-12-05T05:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.