Early Linguistic Pattern of Anxiety from Social Media Using Interpretable Linguistic Features: A Multi-Faceted Validation Study with Author-Disjoint Evaluation
- URL: http://arxiv.org/abs/2601.11758v1
- Date: Fri, 16 Jan 2026 20:22:34 GMT
- Title: Early Linguistic Pattern of Anxiety from Social Media Using Interpretable Linguistic Features: A Multi-Faceted Validation Study with Author-Disjoint Evaluation
- Authors: Arnab Das Utsa,
- Abstract summary: Anxiety affects hundreds of millions of individuals globally, yet large-scale screening remains limited.<n>This work presents a transparent approach to social media-based anxiety detection through linguistically interpretable feature-grounded modeling and cross-domain validation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anxiety affects hundreds of millions of individuals globally, yet large-scale screening remains limited. Social media language provides an opportunity for scalable detection, but current models often lack interpretability, keyword-robustness validation, and rigorous user-level data integrity. This work presents a transparent approach to social media-based anxiety detection through linguistically interpretable feature-grounded modeling and cross-domain validation. Using a substantial dataset of Reddit posts, we trained a logistic regression classifier on carefully curated subreddits for training, validation, and test splits. Comprehensive evaluation included feature ablation, keyword masking experiments, and varying-density difference analyses comparing anxious and control groups, along with external validation using clinically interviewed participants with diagnosed anxiety disorders. The model achieved strong performance while maintaining high accuracy even after sentiment removal or keyword masking. Early detection using minimal post history significantly outperformed random classification, and cross-domain analysis demonstrated strong consistency with clinical interview data. Results indicate that transparent linguistic features can support reliable, generalizable, and keyword-robust anxiety detection. The proposed framework provides a reproducible baseline for interpretable mental health screening across diverse online contexts.
Related papers
- Linguistic Indicators of Early Cognitive Decline in the DementiaBank Pitt Corpus: A Statistical and Machine Learning Study [4.417564179511245]
This study analyzes spontaneous speech transcripts from the DementiaBank Pitt Corpus using three linguistic representations.<n> syntactic and grammatical features retain strong discriminative power even in the absence of lexical content.<n>This study supports the use of linguistically grounded features for transparent and reliable language-based cognitive screening.
arXiv Detail & Related papers (2026-02-11T16:53:57Z) - A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data [36.77792803657935]
We argue that several important barriers to adoption can be addressed using Bayesian network modelling.<n>We evaluate a model for depression and anxiety symptom prediction from voice and speech features in large-scale datasets.
arXiv Detail & Related papers (2025-12-08T17:28:09Z) - A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages [48.68444770923683]
We present the first comprehensive study of multilingual Chain-of-Thought (CoT) reasoning.<n>We measure language compliance, answer accuracy, and answer consistency when LRMs are prompt-hacked to think in a target language.<n>We find that the quality and effectiveness of thinking traces vary substantially depending on the prompt language.
arXiv Detail & Related papers (2025-10-10T17:06:50Z) - Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z) - LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - Detecting anxiety and depression in dialogues: a multi-label and explainable approach [5.635300481123079]
Anxiety and depression are the most common mental health issues worldwide, affecting a non-negligible part of the population.<n>In this work, an entirely novel system for the multi-label classification of anxiety and depression is proposed.
arXiv Detail & Related papers (2024-12-23T15:29:46Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Early stopping by correlating online indicators in neural networks [0.24578723416255746]
We propose a novel technique to identify overfitting phenomena when training the learner.
Our proposal exploits the correlation over time in a collection of online indicators.
As opposed to previous approaches focused on a single criterion, we take advantage of subsidiarities between independent assessments.
arXiv Detail & Related papers (2024-02-04T14:57:20Z) - Identification of Cognitive Decline from Spoken Language through Feature
Selection and the Bag of Acoustic Words Model [0.0]
The early identification of symptoms of memory disorders plays a significant role in ensuring the well-being of populations.
The lack of standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken language.
The work presents an approach related to feature selection, allowing for the automatic selection of the essential features required for diagnosis from the Geneva minimalistic acoustic parameter set and relative speech pauses.
arXiv Detail & Related papers (2024-02-02T17:06:03Z) - A Simple and Flexible Modeling for Mental Disorder Detection by Learning
from Clinical Questionnaires [0.2580765958706853]
We propose a novel approach that captures the semantic meanings directly from the text and compares them to symptom-related descriptions.
Our detailed analysis shows that the proposed model is effective at leveraging domain knowledge, transferable to other mental disorders, and providing interpretable detection results.
arXiv Detail & Related papers (2023-06-05T15:23:55Z) - Learning Language and Multimodal Privacy-Preserving Markers of Mood from
Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care.
One promising data source to help monitor human behavior is daily smartphone usage.
We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.