Related papers: DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

URL: http://arxiv.org/abs/2601.00303v1
Date: Thu, 01 Jan 2026 10:44:38 GMT
Title: DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection
Authors: Yuxin Li, Xiangyu Zhang, Yifei Li, Zhiwei Guo, Haoyang Zhang, Eng Siong Chng, Cuntai Guan,
Abstract summary: We present DepFlow, a depression-conditioned text-to-speech framework.<n>A Depression Acoustic Camouflage learns speaker- and content-invariant depression embeddings through adversarial training.<n>A flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity.<n>A prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum.
Score: 54.209716321122194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment and diagnostic labels, encouraging models to learn semantic shortcuts. As a result, model robustness may be compromised in real-world scenarios, such as Camouflaged Depression, where individuals maintain socially positive or neutral language despite underlying depressive states. To mitigate this semantic bias, we propose DepFlow, a three-stage depression-conditioned text-to-speech framework. First, a Depression Acoustic Encoder learns speaker- and content-invariant depression embeddings through adversarial training, achieving effective disentanglement while preserving depression discriminability (ROC-AUC: 0.693). Second, a flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity while preserving content and speaker identity. Third, a prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum. Using DepFlow, we construct a Camouflage Depression-oriented Augmentation (CDoA) dataset that pairs depressed acoustic patterns with positive/neutral content from a sentiment-stratified text bank, creating acoustic-semantic mismatches underrepresented in natural data. Evaluated across three depression detection architectures, CDoA improves macro-F1 by 9%, 12%, and 5%, respectively, consistently outperforming conventional augmentation strategies in depression Detection. Beyond enhancing robustness, DepFlow provides a controllable synthesis platform for conversational systems and simulation-based evaluation, where real clinical data remains limited by ethical and coverage constraints.

Related papers

ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social Media [48.56586765769052]
We present ReDepress, the first clinically validated social media dataset focused on relapse.<n>Our framework draws on cognitive theories of depression, incorporating constructs such as attention bias, interpretation bias, memory bias and rumination.<n>Our findings validate psychological theories in real-world textual data and underscore the potential of cognitive-informed computational methods for early relapse detection.
arXiv Detail & Related papers (2025-09-22T16:33:59Z)
DepressLLM: Interpretable domain-adapted language model for depression detection from real-world narratives [6.1211540596331755]
This study introduces DepressLLM, trained and evaluated on a novel corpus of 3,699 autobiographical narratives reflecting both happiness and distress.<n>DepressLLM provides interpretable depression predictions and, via its Score-guided Token Probability Summation (SToPS) module, delivers both improved classification performance and reliable confidence estimates.
arXiv Detail & Related papers (2025-08-12T03:12:55Z)
Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection [18.797661194307683]
Previous studies have demonstrated that emotional features from a single acoustic sentiment label can enhance depression diagnosis accuracy.<n>Individuals with depression might convey negative emotional content in an unexpectedly calm manner.<n>This work is the first to incorporate emotional expression inconsistency information into depression detection.
arXiv Detail & Related papers (2024-12-09T02:52:52Z)
A BERT-Based Summarization approach for depression detection [1.7363112470483526]
Depression is a globally prevalent mental disorder with potentially severe repercussions if not addressed. Machine learning and artificial intelligence can autonomously detect depression indicators from diverse data sources. Our study proposes text summarization as a preprocessing technique to diminish the length and intricacies of input texts.
arXiv Detail & Related papers (2024-09-13T02:14:34Z)
Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection [6.561362931802501]
Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability.
arXiv Detail & Related papers (2023-09-23T20:48:58Z)
The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection [69.88072583383085]
This work shows that depression changes the correlation between features extracted from speech. Using such an insight can improve the training speed and performance of depression detectors based on SVMs and LSTMs.
arXiv Detail & Related papers (2023-07-06T09:54:35Z)
Depression detection in social media posts using affective and social norm features [84.12658971655253]
We propose a deep architecture for depression detection from social media posts. We incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme. The inclusion of the proposed features yields state-of-the-art results in both settings.
arXiv Detail & Related papers (2023-03-24T21:26:27Z)
Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data [65.28160163774274]
We apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.
arXiv Detail & Related papers (2022-11-09T14:48:13Z)
Deep Multi-task Learning for Depression Detection and Prediction in Longitudinal Data [50.02223091927777]
Depression is among the most prevalent mental disorders, affecting millions of people of all ages globally. Machine learning techniques have shown effective in enabling automated detection and prediction of depression for early intervention and treatment. We introduce a novel deep multi-task recurrent neural network to tackle this challenge, in which depression classification is jointly optimized with two auxiliary tasks.
arXiv Detail & Related papers (2020-12-05T05:14:14Z)
Multimodal Depression Severity Prediction from medical bio-markers using Machine Learning Tools and Technologies [0.0]
Depression has been a leading cause of mental-health illnesses across the world. Using behavioural cues to automate depression diagnosis and stage prediction in recent years has relatively increased. The absence of labelled behavioural datasets and a vast amount of possible variations prove to be a major challenge in accomplishing the task.
arXiv Detail & Related papers (2020-09-11T20:44:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.