CANAMRF: An Attention-Based Model for Multimodal Depression Detection
- URL: http://arxiv.org/abs/2401.02995v1
- Date: Thu, 4 Jan 2024 12:08:16 GMT
- Title: CANAMRF: An Attention-Based Model for Multimodal Depression Detection
- Authors: Yuntao Wei, Yuzhe Zhang, Shuyang Zhang, and Hong Zhang
- Abstract summary: We present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection.
CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module.
- Score: 7.266707571724883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal depression detection is an important research topic that aims to
predict human mental states using multimodal data. Previous methods treat
different modalities equally and fuse each modality by na\"ive mathematical
operations without measuring the relative importance between them, which cannot
obtain well-performed multimodal representations for downstream depression
tasks. In order to tackle the aforementioned concern, we present a Cross-modal
Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for
multimodal depression detection. CANAMRF is constructed by a multimodal feature
extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid
Attention Module. Through experimentation on two benchmark datasets, CANAMRF
demonstrates state-of-the-art performance, underscoring the effectiveness of
our proposed approach.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)
We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.
We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z) - RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - Multi-modal Crowd Counting via a Broker Modality [64.5356816448361]
Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images.
We propose a novel approach by introducing an auxiliary broker modality and frame the task as a triple-modal learning problem.
We devise a fusion-based method to generate this broker modality, leveraging a non-diffusion, lightweight counterpart of modern denoising diffusion-based fusion models.
arXiv Detail & Related papers (2024-07-10T10:13:11Z) - AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data.
This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - UniS-MMC: Multimodal Classification via Unimodality-supervised
Multimodal Contrastive Learning [29.237813880311943]
We propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting.
Experimental results with fused features on two image-text classification benchmarks show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods.
arXiv Detail & Related papers (2023-05-16T09:18:38Z) - Depression Diagnosis and Analysis via Multimodal Multi-order Factor
Fusion [4.991507302519828]
Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial.
We propose a multimodal multi-order factor fusion (MMFF) method for automatic diagnosis of depression.
arXiv Detail & Related papers (2022-12-31T17:13:06Z) - Multimodal Channel-Mixing: Channel and Spatial Masked AutoEncoder on
Facial Action Unit Detection [12.509298933267225]
This paper presents a novel multi-modal reconstruction network, named Multimodal Channel-Mixing (MCM) as a pre-trained model to learn robust representation for facilitating multi-modal fusion.
The approach follows an early fusion setup, integrating a Channel-Mixing module, where two out of five channels are randomly dropped.
This module not only reduces channel redundancy, but also facilitates multi-modal learning and reconstruction capabilities, resulting in robust feature learning.
arXiv Detail & Related papers (2022-09-25T15:18:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.