DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection
- URL: http://arxiv.org/abs/2409.15936v1
- Date: Tue, 24 Sep 2024 09:58:07 GMT
- Title: DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection
- Authors: Jiaxin Ye, Junping Zhang, Hongming Shan,
- Abstract summary: Depression is a common mental disorder that affects millions of people worldwide.
We propose an audio-visual progressive fusion Mamba for multimodal depression detection, termed DepMamba.
- Score: 37.701518424351505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depression is a common mental disorder that affects millions of people worldwide. Although promising, current multimodal methods hinge on aligned or aggregated multimodal fusion, suffering two significant limitations: (i) inefficient long-range temporal modeling, and (ii) sub-optimal multimodal fusion between intermodal fusion and intramodal processing. In this paper, we propose an audio-visual progressive fusion Mamba for multimodal depression detection, termed DepMamba. DepMamba features two core designs: hierarchical contextual modeling and progressive multimodal fusion. On the one hand, hierarchical modeling introduces convolution neural networks and Mamba to extract the local-to-global features within long-range sequences. On the other hand, the progressive fusion first presents a multimodal collaborative State Space Model (SSM) extracting intermodal and intramodal information for each modality, and then utilizes a multimodal enhanced SSM for modality cohesion. Extensive experimental results on two large-scale depression datasets demonstrate the superior performance of our DepMamba over existing state-of-the-art methods. Code is available at https://github.com/Jiaxin-Ye/DepMamba.
Related papers
- Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model [18.19558762805031]
This paper proposes the Coupled SSM model, for coupling state chains of multiple modalities while maintaining independence of intra-modality state processes.
Experiments on CMU-EI, CH-SIMS, CH-SIMSV2 through multi-domain input verify the effectiveness of our model.
Results demonstrate that Coupled Mamba model is capable of enhanced multi-modal fusion.
arXiv Detail & Related papers (2024-05-28T09:57:03Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba [17.75933946414591]
Multi-modal image fusion aims to combine information from different modes to create a single image with detailed textures.
Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity.
We propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba.
arXiv Detail & Related papers (2024-04-15T06:37:21Z) - MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion [4.2474907126377115]
Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image.
We propose a Mamba-based Dual-phase Fusion model (MambaDFuse) to extract modality-specific and modality-fused features.
Our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2024-04-12T11:33:26Z) - SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction [8.452410804749512]
We propose a structured state space model named Mamba with multi-grained multi-modal interaction (SurvMamba) for survival prediction.
SurvMamba is implemented with a Hierarchical Interaction Mamba (HIM) module that facilitates efficient intra-modal interactions at different granularities.
An Interaction Fusion Mamba (IFM) module is used for cascaded inter-modal interactive fusion, yielding more comprehensive features for survival prediction.
arXiv Detail & Related papers (2024-04-11T15:58:12Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - NestedFormer: Nested Modality-Aware Transformer for Brain Tumor
Segmentation [29.157465321864265]
We propose a novel Nested Modality-Aware Transformer (NestedFormer) to explore the intra-modality and inter-modality relationships of multi-modal MRIs for brain tumor segmentation.
Built on the transformer-based multi-encoder and single-decoder structure, we perform nested multi-modal fusion for high-level representations of different modalities.
arXiv Detail & Related papers (2022-08-31T14:04:25Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.