CANAMRF: An Attention-Based Model for Multimodal Depression Detection
- URL: http://arxiv.org/abs/2401.02995v1
- Date: Thu, 4 Jan 2024 12:08:16 GMT
- Title: CANAMRF: An Attention-Based Model for Multimodal Depression Detection
- Authors: Yuntao Wei, Yuzhe Zhang, Shuyang Zhang, and Hong Zhang
- Abstract summary: We present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection.
CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module.
- Score: 7.266707571724883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal depression detection is an important research topic that aims to
predict human mental states using multimodal data. Previous methods treat
different modalities equally and fuse each modality by na\"ive mathematical
operations without measuring the relative importance between them, which cannot
obtain well-performed multimodal representations for downstream depression
tasks. In order to tackle the aforementioned concern, we present a Cross-modal
Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for
multimodal depression detection. CANAMRF is constructed by a multimodal feature
extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid
Attention Module. Through experimentation on two benchmark datasets, CANAMRF
demonstrates state-of-the-art performance, underscoring the effectiveness of
our proposed approach.
Related papers
- RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - Multi-modal Crowd Counting via a Broker Modality [64.5356816448361]
Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images.
We propose a novel approach by introducing an auxiliary broker modality and frame the task as a triple-modal learning problem.
We devise a fusion-based method to generate this broker modality, leveraging a non-diffusion, lightweight counterpart of modern denoising diffusion-based fusion models.
arXiv Detail & Related papers (2024-07-10T10:13:11Z) - AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data.
This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - UniS-MMC: Multimodal Classification via Unimodality-supervised
Multimodal Contrastive Learning [29.237813880311943]
We propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting.
Experimental results with fused features on two image-text classification benchmarks show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods.
arXiv Detail & Related papers (2023-05-16T09:18:38Z) - Depression Diagnosis and Analysis via Multimodal Multi-order Factor
Fusion [4.991507302519828]
Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial.
We propose a multimodal multi-order factor fusion (MMFF) method for automatic diagnosis of depression.
arXiv Detail & Related papers (2022-12-31T17:13:06Z) - Multimodal Channel-Mixing: Channel and Spatial Masked AutoEncoder on
Facial Action Unit Detection [12.509298933267225]
This paper presents a novel multi-modal reconstruction network, named Multimodal Channel-Mixing (MCM) as a pre-trained model to learn robust representation for facilitating multi-modal fusion.
The approach follows an early fusion setup, integrating a Channel-Mixing module, where two out of five channels are randomly dropped.
This module not only reduces channel redundancy, but also facilitates multi-modal learning and reconstruction capabilities, resulting in robust feature learning.
arXiv Detail & Related papers (2022-09-25T15:18:56Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.