Related papers: HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis

HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis

URL: http://arxiv.org/abs/2602.16245v1
Date: Wed, 18 Feb 2026 07:47:49 GMT
Title: HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis
Authors: J. Dhar, M. K. Pandey, D. Chakladar, M. Haghighat, A. Alavi, S. Mistry, N. Zaidi,
Abstract summary: We propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net)<n>HyPCA-Net is composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities.<n>Experiments show that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal fusion frameworks, which integrate diverse medical imaging modalities (e.g., MRI, CT), have shown great potential in applications such as skin cancer detection, dementia diagnosis, and brain tumor prediction. However, existing multimodal fusion methods face significant challenges. First, they often rely on computationally expensive models, limiting their applicability in low-resource environments. Second, they often employ cascaded attention modules, which potentially increase risk of information loss during inter-module transitions and hinder their capacity to effectively capture robust shared representations across modalities. This restricts their generalization in multi-disease analysis tasks. To address these limitations, we propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net), composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing refined modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities. Extensive experiments on ten publicly available datasets exhibit that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost. Code: https://github.com/misti1203/HyPCA-Net.

Related papers

Effective and Robust Multimodal Medical Image Analysis [2.0518682437126095]
Multimodal Fusion Learning (MFL) has shown great potential for addressing medical problems such as skin cancer and brain tumor prediction.<n>Existing MFL methods face three key limitations.<n>We propose a novel Multi-Attention Integration Learning (MAIL) network, incorporating two key components.
arXiv Detail & Related papers (2026-02-17T04:23:46Z)
Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation [28.992992584085787]
Multimodal learning has shown significant performance boost compared to ordinary unimodal models.<n>In real-world scenarios, multimodal signals are susceptible to missing because of sensor failures and adverse weather conditions.<n>We propose a novel Generative-Enhanced MultiModal learning Network (GEMMNet) to tackle these limitations.
arXiv Detail & Related papers (2025-09-14T05:40:35Z)
impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z)
MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction [5.895727565919295]
This paper presents a Multimodal Representation Decoupling Network (MurreNet) to advance cancer survival analysis.<n>MurreNet decomposes paired input data into modality-specific and modality-shared representations, thereby reducing redundancy between modalities.<n>Experiments conducted on six TCGA cancer cohorts demonstrate that our MurreNet achieves state-of-the-art (SOTA) performance in survival prediction.
arXiv Detail & Related papers (2025-07-07T11:26:29Z)
MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.<n>MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.<n>We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z)
Multimodal Fusion Learning with Dual Attention for Medical Imaging [8.74917075651321]
Multimodal fusion learning has shown significant promise in classifying various diseases such as skin cancer and brain tumors.<n>Existing methods face three key limitations.<n>DRIFA can be integrated with any deep neural network, forming a multimodal fusion learning framework denoted as DRIFA-Net.
arXiv Detail & Related papers (2024-12-02T08:11:12Z)
HyperMM : Robust Multimodal Learning with Varying-sized Inputs [4.377889826841039]
HyperMM is an end-to-end framework designed for learning with varying-sized inputs. We introduce a novel strategy for training a universal feature extractor using a conditional hypernetwork. We experimentally demonstrate the advantages of our method in two tasks: Alzheimer's disease detection and breast cancer classification.
arXiv Detail & Related papers (2024-07-30T12:13:18Z)
Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition. We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model. HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z)
M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume. We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.