Effective and Robust Multimodal Medical Image Analysis
- URL: http://arxiv.org/abs/2602.15346v1
- Date: Tue, 17 Feb 2026 04:23:46 GMT
- Title: Effective and Robust Multimodal Medical Image Analysis
- Authors: Joy Dhar, Nayyar Zaidi, Maryam Haghighat,
- Abstract summary: Multimodal Fusion Learning (MFL) has shown great potential for addressing medical problems such as skin cancer and brain tumor prediction.<n>Existing MFL methods face three key limitations.<n>We propose a novel Multi-Attention Integration Learning (MAIL) network, incorporating two key components.
- Score: 2.0518682437126095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Fusion Learning (MFL), leveraging disparate data from various imaging modalities (e.g., MRI, CT, SPECT), has shown great potential for addressing medical problems such as skin cancer and brain tumor prediction. However, existing MFL methods face three key limitations: a) they often specialize in specific modalities, and overlook effective shared complementary information across diverse modalities, hence limiting their generalizability for multi-disease analysis; b) they rely on computationally expensive models, restricting their applicability in resource-limited settings; and c) they lack robustness against adversarial attacks, compromising reliability in medical AI applications. To address these limitations, we propose a novel Multi-Attention Integration Learning (MAIL) network, incorporating two key components: a) an efficient residual learning attention block for capturing refined modality-specific multi-scale patterns and b) an efficient multimodal cross-attention module for learning enriched complementary shared representations across diverse modalities. Furthermore, to ensure adversarial robustness, we extend MAIL network to design Robust-MAIL by incorporating random projection filters and modulated attention noise. Extensive evaluations on 20 public datasets show that both MAIL and Robust-MAIL outperform existing methods, achieving performance gains of up to 9.34% while reducing computational costs by up to 78.3%. These results highlight the superiority of our approaches, ensuring more reliable predictions than top competitors. Code: https://github.com/misti1203/MAIL-Robust-MAIL.
Related papers
- HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis [0.0]
We propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net)<n>HyPCA-Net is composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities.<n>Experiments show that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost.
arXiv Detail & Related papers (2026-02-18T07:47:49Z) - Robust Multimodal Sentiment Analysis via Double Information Bottleneck [55.32835720742616]
Multimodal sentiment analysis has received significant attention across diverse research domains.<n>Existing approaches suffer from insufficient learning of noise-contaminated unimodal data.<n>This paper proposes a Double Information Bottleneck (DIB) strategy to obtain a powerful, unified compact multimodal representation.
arXiv Detail & Related papers (2025-11-03T10:52:45Z) - MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series [7.657107258507061]
We introduce MAESTRO, a novel framework that overcomes key limitations of existing multimodal learning approaches.<n>At its core, MAESTRO facilitates dynamic intra- and cross-modal interactions based on task relevance.<n>We evaluate MAESTRO against 10 baselines on four diverse datasets spanning three applications.
arXiv Detail & Related papers (2025-09-29T03:07:06Z) - Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction [17.717216490402482]
We propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning.<n>We validate our framework on large-scale clinical datasets for disease detection and prediction tasks.<n>Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning.
arXiv Detail & Related papers (2025-09-22T18:12:12Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.<n>MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.<n>We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Multimodal Fusion Learning with Dual Attention for Medical Imaging [8.74917075651321]
Multimodal fusion learning has shown significant promise in classifying various diseases such as skin cancer and brain tumors.<n>Existing methods face three key limitations.<n>DRIFA can be integrated with any deep neural network, forming a multimodal fusion learning framework denoted as DRIFA-Net.
arXiv Detail & Related papers (2024-12-02T08:11:12Z) - HyperMM : Robust Multimodal Learning with Varying-sized Inputs [4.377889826841039]
HyperMM is an end-to-end framework designed for learning with varying-sized inputs.
We introduce a novel strategy for training a universal feature extractor using a conditional hypernetwork.
We experimentally demonstrate the advantages of our method in two tasks: Alzheimer's disease detection and breast cancer classification.
arXiv Detail & Related papers (2024-07-30T12:13:18Z) - Completed Feature Disentanglement Learning for Multimodal MRIs Analysis [36.32164729310868]
Feature disentanglement (FD)-based methods have achieved significant success in multimodal learning (MML)<n>We propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling.<n>Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs.
arXiv Detail & Related papers (2024-07-06T01:49:38Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.