Multimodal Fusion Learning with Dual Attention for Medical Imaging
- URL: http://arxiv.org/abs/2412.01248v1
- Date: Mon, 02 Dec 2024 08:11:12 GMT
- Title: Multimodal Fusion Learning with Dual Attention for Medical Imaging
- Authors: Joy Dhar, Nayyar Zaidi, Maryam Haghighat, Puneet Goyal, Sudipta Roy, Azadeh Alavi, Vikas Kumar,
- Abstract summary: Multimodal fusion learning has shown significant promise in classifying various diseases such as skin cancer and brain tumors.
Existing methods face three key limitations.
DRIFA can be integrated with any deep neural network, forming a multimodal fusion learning framework denoted as DRIFA-Net.
- Score: 8.74917075651321
- License:
- Abstract: Multimodal fusion learning has shown significant promise in classifying various diseases such as skin cancer and brain tumors. However, existing methods face three key limitations. First, they often lack generalizability to other diagnosis tasks due to their focus on a particular disease. Second, they do not fully leverage multiple health records from diverse modalities to learn robust complementary information. And finally, they typically rely on a single attention mechanism, missing the benefits of multiple attention strategies within and across various modalities. To address these issues, this paper proposes a dual robust information fusion attention mechanism (DRIFA) that leverages two attention modules, i.e. multi-branch fusion attention module and the multimodal information fusion attention module. DRIFA can be integrated with any deep neural network, forming a multimodal fusion learning framework denoted as DRIFA-Net. We show that the multi-branch fusion attention of DRIFA learns enhanced representations for each modality, such as dermoscopy, pap smear, MRI, and CT-scan, whereas multimodal information fusion attention module learns more refined multimodal shared representations, improving the network's generalization across multiple tasks and enhancing overall performance. Additionally, to estimate the uncertainty of DRIFA-Net predictions, we have employed an ensemble Monte Carlo dropout strategy. Extensive experiments on five publicly available datasets with diverse modalities demonstrate that our approach consistently outperforms state-of-the-art methods. The code is available at https://github.com/misti1203/DRIFA-Net.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Deep Multimodal Collaborative Learning for Polyp Re-Identification [4.4028428688691905]
Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras.
Traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset produce unsatisfactory retrieval performance.
We propose a novel Deep Multimodal Collaborative Learning framework named DMCL for polyp re-identification.
arXiv Detail & Related papers (2024-08-12T04:05:19Z) - Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning.
AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv Detail & Related papers (2024-07-25T17:46:38Z) - Modality-Aware and Shift Mixer for Multi-modal Brain Tumor Segmentation [12.094890186803958]
We present a novel Modality Aware and Shift Mixer that integrates intra-modality and inter-modality dependencies of multi-modal images for effective and robust brain tumor segmentation.
Specifically, we introduce a Modality-Aware module according to neuroimaging studies for modeling the specific modality pair relationships at low levels, and a Modality-Shift module with specific mosaic patterns is developed to explore the complex relationships across modalities at high levels via the self-attention.
arXiv Detail & Related papers (2024-03-04T14:21:51Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes
for Medical Image Super-Resolution [56.622832383316215]
We propose a novel multi-head convolutional attention module to super-resolve CT and MRI scans.
Our attention module uses the convolution operation to perform joint spatial-channel attention on multiple input tensors.
We introduce multiple attention heads, each head having a distinct receptive field size corresponding to a particular reduction rate for the spatial attention.
arXiv Detail & Related papers (2022-04-08T07:56:55Z) - A Multi-View Dynamic Fusion Framework: How to Improve the Multimodal
Brain Tumor Segmentation from Multi-Views? [5.793853101758628]
This paper proposes a multi-view dynamic fusion framework to improve the performance of brain tumor segmentation.
By evaluating the proposed framework on BRATS 2015 and BRATS 2018, it can be found that the fusion results from multi-views achieve a better performance than the segmentation result from the single view.
arXiv Detail & Related papers (2020-12-21T09:45:23Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z) - Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement
and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities.
Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code.
We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.