Related papers: Attention-Based Acoustic Feature Fusion Network for Depression Detection

Attention-Based Acoustic Feature Fusion Network for Depression Detection

URL: http://arxiv.org/abs/2308.12478v1
Date: Thu, 24 Aug 2023 00:31:51 GMT
Title: Attention-Based Acoustic Feature Fusion Network for Depression Detection
Authors: Xiao Xu, Yang Wang, Xinru Wei, Fei Wang, Xizhe Zhang
Abstract summary: We present the Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features. We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features.
Score: 11.972591489278988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depression, a common mental disorder, significantly influences individuals and imposes considerable societal impacts. The complexity and heterogeneity of the disorder necessitate prompt and effective detection, which nonetheless, poses a difficult challenge. This situation highlights an urgent requirement for improved detection methods. Exploiting auditory data through advanced machine learning paradigms presents promising research directions. Yet, existing techniques mainly rely on single-dimensional feature models, potentially neglecting the abundance of information hidden in various speech characteristics. To rectify this, we present the novel Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features. We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features. The effectiveness of our approach is confirmed via extensive validation on two clinical speech databases, CNRAC and CS-NRAC, thereby outperforming previous methods in depression detection and subtype classification. Further in-depth analysis confirms the key role of each feature and highlights the importance of MFCCrelated features in speech-based depression detection.

Related papers

Structure-Accurate Medical Image Translation based on Dynamic Frequency Balance and Knowledge Guidance [60.33892654669606]
Diffusion model is a powerful strategy to synthesize the required medical images. Existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information. We propose a novel method based on dynamic frequency balance and knowledge guidance.
arXiv Detail & Related papers (2025-04-13T05:48:13Z)
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection [57.649223695021114]
We present a novel weakly supervised framework that leverages audio-visual collaboration for robust video anomaly detection. Our framework demonstrates superior performance across multiple benchmarks, with audio integration significantly boosting anomaly detection accuracy.
arXiv Detail & Related papers (2025-04-06T13:59:16Z)
Dementia Insights: A Context-Based MultiModal Approach [0.3749861135832073]
Early detection is crucial for timely interventions that may slow disease progression. Large pre-trained models (LPMs) for text and audio have shown promise in identifying cognitive impairments. This study proposes a context-based multimodal method, integrating both text and audio data using the best-performing LPMs.
arXiv Detail & Related papers (2025-03-03T06:46:26Z)
Innovative Framework for Early Estimation of Mental Disorder Scores to Enable Timely Interventions [0.9297614330263184]
An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper. The proposed method achieves classification accuracies of 92% for depression and 93% for PTSD, outperforming traditional unimodal approaches.
arXiv Detail & Related papers (2025-02-06T10:57:10Z)
Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech [60.08015780474457]
Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models. We identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments. We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively.
arXiv Detail & Related papers (2024-09-22T02:06:05Z)
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF) We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions. We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z)
Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders [0.8437187555622164]
We introduce DAAMAudioCNNLSTM and DAAMAudioTransformer, two parameter efficient and explainable models for audio feature extraction and depression detection. Both models' significant explainability and efficiency in leveraging speech signals for depression detection represent a leap towards more reliable, clinically useful diagnostic tools.
arXiv Detail & Related papers (2024-08-31T08:50:28Z)
A Depression Detection Method Based on Multi-Modal Feature Fusion Using Cross-Attention [3.4872769952628926]
Depression affects approximately 3.8% of the global population. Over 75% of individuals in low- and middle-income countries remain untreated. This paper introduces a novel method for detecting depression based on multi-modal feature fusion utilizing cross-attention.
arXiv Detail & Related papers (2024-07-02T13:13:35Z)
Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z)
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z)
DEPAC: a Corpus for Depression and Anxiety Detection from Speech [3.2154432166999465]
We introduce a novel mental distress analysis audio dataset DEPAC, labeled based on established thresholds on depression and anxiety screening tools. This large dataset comprises multiple speech tasks per individual, as well as relevant demographic information. We present a feature set consisting of hand-curated acoustic and linguistic features, which were found effective in identifying signs of mental illnesses in human speech.
arXiv Detail & Related papers (2023-06-20T12:21:06Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)
Automatic Depression Detection via Learning and Fusing Features from Visual Cues [42.71590961896457]
We propose a novel Automatic Depression Detection (ADD) method via learning and fusing features from visual cues. Our method achieves the state-of-the-art performance on the DAIC_WOZ dataset compared to other visual-feature-based methods.
arXiv Detail & Related papers (2022-03-01T09:28:12Z)
Multimodal Depression Severity Prediction from medical bio-markers using Machine Learning Tools and Technologies [0.0]
Depression has been a leading cause of mental-health illnesses across the world. Using behavioural cues to automate depression diagnosis and stage prediction in recent years has relatively increased. The absence of labelled behavioural datasets and a vast amount of possible variations prove to be a major challenge in accomplishing the task.
arXiv Detail & Related papers (2020-09-11T20:44:28Z)
Audio Impairment Recognition Using a Correlation-Based Feature Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.