Attention-Based Acoustic Feature Fusion Network for Depression Detection
- URL: http://arxiv.org/abs/2308.12478v1
- Date: Thu, 24 Aug 2023 00:31:51 GMT
- Title: Attention-Based Acoustic Feature Fusion Network for Depression Detection
- Authors: Xiao Xu, Yang Wang, Xinru Wei, Fei Wang, Xizhe Zhang
- Abstract summary: We present the Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection.
ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features.
We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features.
- Score: 11.972591489278988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depression, a common mental disorder, significantly influences individuals
and imposes considerable societal impacts. The complexity and heterogeneity of
the disorder necessitate prompt and effective detection, which nonetheless,
poses a difficult challenge. This situation highlights an urgent requirement
for improved detection methods. Exploiting auditory data through advanced
machine learning paradigms presents promising research directions. Yet,
existing techniques mainly rely on single-dimensional feature models,
potentially neglecting the abundance of information hidden in various speech
characteristics. To rectify this, we present the novel Attention-Based Acoustic
Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines
four different acoustic features into a comprehensive deep learning model,
thereby effectively integrating and blending multi-tiered features. We present
a novel weight adjustment module for late fusion that boosts performance by
efficaciously synthesizing these features. The effectiveness of our approach is
confirmed via extensive validation on two clinical speech databases, CNRAC and
CS-NRAC, thereby outperforming previous methods in depression detection and
subtype classification. Further in-depth analysis confirms the key role of each
feature and highlights the importance of MFCCrelated features in speech-based
depression detection.
Related papers
- A Depression Detection Method Based on Multi-Modal Feature Fusion Using Cross-Attention [3.4872769952628926]
Depression affects approximately 3.8% of the global population.
Over 75% of individuals in low- and middle-income countries remain untreated.
This paper introduces a novel method for detecting depression based on multi-modal feature fusion utilizing cross-attention.
arXiv Detail & Related papers (2024-07-02T13:13:35Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - On the Onset of Robust Overfitting in Adversarial Training [66.27055915739331]
Adversarial Training (AT) is a widely-used algorithm for building robust neural networks.
AT suffers from the issue of robust overfitting, the fundamental mechanism of which remains unclear.
arXiv Detail & Related papers (2023-10-01T07:57:03Z) - DEPAC: a Corpus for Depression and Anxiety Detection from Speech [3.2154432166999465]
We introduce a novel mental distress analysis audio dataset DEPAC, labeled based on established thresholds on depression and anxiety screening tools.
This large dataset comprises multiple speech tasks per individual, as well as relevant demographic information.
We present a feature set consisting of hand-curated acoustic and linguistic features, which were found effective in identifying signs of mental illnesses in human speech.
arXiv Detail & Related papers (2023-06-20T12:21:06Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Automatic Depression Detection via Learning and Fusing Features from
Visual Cues [42.71590961896457]
We propose a novel Automatic Depression Detection (ADD) method via learning and fusing features from visual cues.
Our method achieves the state-of-the-art performance on the DAIC_WOZ dataset compared to other visual-feature-based methods.
arXiv Detail & Related papers (2022-03-01T09:28:12Z) - Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and
Speech Pause Features Robust to Noisy Inputs [11.34426502082293]
We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker has Alzheimer's Disease.
Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores.
arXiv Detail & Related papers (2021-06-29T19:24:29Z) - Multimodal Depression Severity Prediction from medical bio-markers using
Machine Learning Tools and Technologies [0.0]
Depression has been a leading cause of mental-health illnesses across the world.
Using behavioural cues to automate depression diagnosis and stage prediction in recent years has relatively increased.
The absence of labelled behavioural datasets and a vast amount of possible variations prove to be a major challenge in accomplishing the task.
arXiv Detail & Related papers (2020-09-11T20:44:28Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.