Related papers: MDD-Net: Multimodal Depression Detection through Mutual Transformer

MDD-Net: Multimodal Depression Detection through Mutual Transformer

URL: http://arxiv.org/abs/2508.08093v1
Date: Mon, 11 Aug 2025 15:32:56 GMT
Title: MDD-Net: Multimodal Depression Detection through Mutual Transformer
Authors: Md Rezwanul Haque, Md. Milon Islam, S M Taslim Uddin Raju, Hamdi Altaheri, Lobna Nassar, Fakhri Karray,
Abstract summary: Depression is a major mental health condition that severely impacts the emotional and physical well-being of individuals.<n>A Multimodal Depression Detection Network (MDD-Net) is proposed in this work where mutual transformers are exploited to efficiently extract and fuse multimodal features for efficient depression detection.<n>The developed multimodal depression detection network surpasses the state-of-the-art by up to 17.37% for F1-Score.
Score: 1.18749525824656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depression is a major mental health condition that severely impacts the emotional and physical well-being of individuals. The simple nature of data collection from social media platforms has attracted significant interest in properly utilizing this information for mental health research. A Multimodal Depression Detection Network (MDD-Net), utilizing acoustic and visual data obtained from social media networks, is proposed in this work where mutual transformers are exploited to efficiently extract and fuse multimodal features for efficient depression detection. The MDD-Net consists of four core modules: an acoustic feature extraction module for retrieving relevant acoustic attributes, a visual feature extraction module for extracting significant high-level patterns, a mutual transformer for computing the correlations among the generated features and fusing these features from multiple modalities, and a detection layer for detecting depression using the fused feature representations. The extensive experiments are performed using the multimodal D-Vlog dataset, and the findings reveal that the developed multimodal depression detection network surpasses the state-of-the-art by up to 17.37% for F1-Score, demonstrating the greater performance of the proposed system. The source code is accessible at https://github.com/rezwanh001/Multimodal-Depression-Detection.

Related papers

Exploring Machine Learning and Language Models for Multimodal Depression Detection [8.357574678947245]
This paper presents our approach to the first Multimodal Personality-Aware Depression Detection Challenge.<n>We explore and compare the performance of XGBoost, transformer-based architectures, and large language models (LLMs) on audio, video, and text features.<n>Our results highlight the strengths and limitations of each type of model in capturing depression-related signals across modalities.
arXiv Detail & Related papers (2025-08-28T14:07:07Z)
MMFformer: Multimodal Fusion Transformer Network for Depression Detection [1.18749525824656]
Depression is a serious mental health illness that significantly affects an individual's well-being and quality of life.<n>This paper introduces a multimodal detection network designed to retrieve depressive-temporal high-level patterns from multimodal social media information.<n>The proposed network is assessed on two large-scale depression detection datasets.
arXiv Detail & Related papers (2025-08-08T21:03:29Z)
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF) We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions. We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z)
MOGAM: A Multimodal Object-oriented Graph Attention Model for Depression Detection [5.506046101113427]
We introduce a Multimodal Object-Oriented Graph Attention Model (MOGAM) for detecting depression in social media. To ensure that our model can capture authentic symptoms of depression, we only include vlogs from users with a clinical diagnosis. MOGAM achieved an accuracy of 0.871 and an F1-score of 0.888.
arXiv Detail & Related papers (2024-03-21T07:45:58Z)
CANAMRF: An Attention-Based Model for Multimodal Depression Detection [7.266707571724883]
We present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection. CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module.
arXiv Detail & Related papers (2024-01-04T12:08:16Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Attention-Based Acoustic Feature Fusion Network for Depression Detection [11.972591489278988]
We present the Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features. We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features.
arXiv Detail & Related papers (2023-08-24T00:31:51Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
Automatic Depression Detection via Learning and Fusing Features from Visual Cues [42.71590961896457]
We propose a novel Automatic Depression Detection (ADD) method via learning and fusing features from visual cues. Our method achieves the state-of-the-art performance on the DAIC_WOZ dataset compared to other visual-feature-based methods.
arXiv Detail & Related papers (2022-03-01T09:28:12Z)
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations. Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived. We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z)
Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information. In this paper, we explore these issues from a new perspective. We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume. We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.