Related papers: READ-Net: Clarifying Emotional Ambiguity via Adaptive Feature Recalibration for Audio-Visual Depression Detection

READ-Net: Clarifying Emotional Ambiguity via Adaptive Feature Recalibration for Audio-Visual Depression Detection

URL: http://arxiv.org/abs/2601.14651v1
Date: Wed, 21 Jan 2026 04:55:10 GMT
Title: READ-Net: Clarifying Emotional Ambiguity via Adaptive Feature Recalibration for Audio-Visual Depression Detection
Authors: Chenglizhao Chen, Boze Li, Mengke Song, Dehao Feng, Xinyu Liu, Shanchen Pang, Jufeng Yang, Hui Yu,
Abstract summary: Depression is a severe global mental health issue that impairs daily functioning and overall quality of life.<n>We propose READ-Net, the first audio-visual depression detection framework explicitly designed to resolve Emotional Ambiguity.<n> READ-Net innovatively identifies and preserves depressive-relevant cues within emotional features, while adaptively filtering out irrelevant emotional noise.
Score: 44.6096152592417
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Depression is a severe global mental health issue that impairs daily functioning and overall quality of life. Although recent audio-visual approaches have improved automatic depression detection, methods that ignore emotional cues often fail to capture subtle depressive signals hidden within emotional expressions. Conversely, those incorporating emotions frequently confuse transient emotional expressions with stable depressive symptoms in feature representations, a phenomenon termed \emph{Emotional Ambiguity}, thereby leading to detection errors. To address this critical issue, we propose READ-Net, the first audio-visual depression detection framework explicitly designed to resolve Emotional Ambiguity through Adaptive Feature Recalibration (AFR). The core insight of AFR is to dynamically adjust the weights of emotional features to enhance depression-related signals. Rather than merely overlooking or naively combining emotional information, READ-Net innovatively identifies and preserves depressive-relevant cues within emotional features, while adaptively filtering out irrelevant emotional noise. This recalibration strategy significantly clarifies feature representations, and effectively mitigates the persistent challenge of emotional interference. Additionally, READ-Net can be easily integrated into existing frameworks for improved performance. Extensive evaluations on three publicly available datasets show that READ-Net outperforms state-of-the-art methods, with average gains of 4.55\% in accuracy and 1.26\% in F1-score, demonstrating its robustness to emotional disturbances and improving audio-visual depression detection.

Related papers

ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning [67.22219034602514]
We introduce ADEPT (Agentic Decoding of Emotion via Evidence Probing Tools), a framework that reframes emotion recognition as a multi-turn inquiry process.<n> ADEPT transforms an SLLM into an agent that maintains an evolving candidate emotion set and adaptively invokes dedicated semantic and acoustic probing tools.<n>We show that ADEPT improves primary emotion accuracy in most settings while substantially improving minor emotion characterization.
arXiv Detail & Related papers (2026-02-13T08:33:37Z)
DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection [54.209716321122194]
We present DepFlow, a depression-conditioned text-to-speech framework.<n>A Depression Acoustic Camouflage learns speaker- and content-invariant depression embeddings through adversarial training.<n>A flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity.<n>A prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum.
arXiv Detail & Related papers (2026-01-01T10:44:38Z)
EmoCAST: Emotional Talking Portrait via Emotive Text Description [56.42674612728354]
EmoCAST is a diffusion-based framework for precise text-driven emotional synthesis.<n>In appearance modeling, emotional prompts are integrated through a text-guided decoupled emotive module.<n>EmoCAST achieves state-of-the-art performance in generating realistic, emotionally expressive, and audio-synchronized talking-head videos.
arXiv Detail & Related papers (2025-08-28T10:02:06Z)
Neural Responses to Affective Sentences Reveal Signatures of Depression [18.304785509577766]
Major Depressive Disorder (MDD) is a highly prevalent mental health condition, and a deeper understanding of its neurocognitive foundations is essential.<n>We investigate how depression alters the temporal dynamics of emotional processing by measuring neural responses to self-referential affective sentences.<n>Our results reveal significant group-level differences in neural activity during sentence viewing, suggesting disrupted integration of emotional and self-referential information in depression.
arXiv Detail & Related papers (2025-06-06T17:09:08Z)
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
LEL: A Novel Lipschitz Continuity-constrained Ensemble Learning Model for EEG-based Emotion Recognition [6.9292405290420005]
We introduce LEL (Lipschitz continuity-constrained Ensemble Learning), a novel framework that enhances EEG-based emotion recognition.<n> Experimental results on three public benchmark datasets demonstrated the LEL's state-of-the-art performance.
arXiv Detail & Related papers (2025-04-12T09:41:23Z)
Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection [18.797661194307683]
Previous studies have demonstrated that emotional features from a single acoustic sentiment label can enhance depression diagnosis accuracy.<n>Individuals with depression might convey negative emotional content in an unexpectedly calm manner.<n>This work is the first to incorporate emotional expression inconsistency information into depression detection.
arXiv Detail & Related papers (2024-12-09T02:52:52Z)
Catching Elusive Depression via Facial Micro-Expression Recognition [17.236980932143855]
Depression is a common mental health disorder that can cause consequential symptoms with continuously depressed mood. One category of depression is Concealed Depression, where patients intentionally or unintentionally hide their genuine emotions. We propose to diagnose concealed depression by using facial micro-expressions to detect and recognize underlying true emotions.
arXiv Detail & Related papers (2023-07-29T01:51:17Z)
Climate and Weather: Inspecting Depression Detection via Emotion Recognition [25.290414205116107]
This paper uses pretrained features extracted from the emotion recognition model for depression detection to form multimodal depression detection. The proposed emotion transfer improves depression detection performance on DAIC-WOZ as well as increases the training stability.
arXiv Detail & Related papers (2022-04-29T13:44:22Z)
Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In this paper, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.