Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction
- URL: http://arxiv.org/abs/2508.18632v1
- Date: Tue, 26 Aug 2025 03:18:25 GMT
- Title: Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction
- Authors: Huayi Wang, Haochao Ying, Yuyang Xu, Qibo Qiu, Cheng Zhang, Danny Z. Chen, Ying Sun, Jian Wu,
- Abstract summary: Cancer survival analysis commonly integrates information across diverse medical modalities to make survival-time predictions.<n>Existing methods primarily focus on extracting different decoupled features of modalities and performing fusion operations such as concatenation, attention, and MoE-based fusion.<n>We propose a novel Decoupling-Reorganization-Fusion framework (DeReF), which devises a random feature reorganization strategy between modalities decoupling and dynamic MoE fusion modules.
- Score: 25.880454851313434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cancer survival analysis commonly integrates information across diverse medical modalities to make survival-time predictions. Existing methods primarily focus on extracting different decoupled features of modalities and performing fusion operations such as concatenation, attention, and MoE-based (Mixture-of-Experts) fusion. However, these methods still face two key challenges: i) Fixed fusion schemes (concatenation and attention) can lead to model over-reliance on predefined feature combinations, limiting the dynamic fusion of decoupled features; ii) in MoE-based fusion methods, each expert network handles separate decoupled features, which limits information interaction among the decoupled features. To address these challenges, we propose a novel Decoupling-Reorganization-Fusion framework (DeReF), which devises a random feature reorganization strategy between modalities decoupling and dynamic MoE fusion modules.Its advantages are: i) it increases the diversity of feature combinations and granularity, enhancing the generalization ability of the subsequent expert networks; ii) it overcomes the problem of information closure and helps expert networks better capture information among decoupled features. Additionally, we incorporate a regional cross-attention network within the modality decoupling module to improve the representation quality of decoupled features. Extensive experimental results on our in-house Liver Cancer (LC) and three widely used TCGA public datasets confirm the effectiveness of our proposed method. The code will be made publicly available.
Related papers
- Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation [6.302779966909783]
We propose a novel semi-supervised multi-modal framework for medical image segmentation.<n>We introduce a Modality-specific Enhancing Module (MEM) to strengthen semantic unique cues to each modality.<n>We also introduce a learnable Complementary Information Fusion (CIF) module to adaptively exchange complementary knowledge between modalities.
arXiv Detail & Related papers (2025-12-10T16:15:17Z) - IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection [23.256601188227865]
We propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy.<n>The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features.<n>IRDFusion consistently outperforms existing methods across diverse challenging scenarios.
arXiv Detail & Related papers (2025-09-11T01:22:35Z) - Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning [1.5367554212163714]
We introduce Meta Fusion, a flexible and principled framework that unifies existing strategies as special cases.<n>Motivated by deep mutual learning and ensemble learning, Meta Fusion constructs a cohort of models based on various combinations of latent representations across modalities.<n>Our approach is model-agnostic in learning the latent representations, allowing it to flexibly adapt to the unique characteristics of each modality.
arXiv Detail & Related papers (2025-07-27T00:50:29Z) - Selective Complementary Feature Fusion and Modal Feature Compression Interaction for Brain Tumor Segmentation [14.457627015612827]
We propose a complementary feature compression interaction network (CFCI-Net), which realizes the complementary fusion and compression interaction of multi-modal feature information.<n>CFCI-Net achieves superior results compared to state-of-the-art models.
arXiv Detail & Related papers (2025-03-20T13:52:51Z) - Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion [25.140475569677758]
Multimodal image fusion aims to integrate information from different modalities to obtain a comprehensive image.
Existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies.
This paper dissects the significant differences between the two tasks regarding fusion goals, statistical properties, and data distribution.
arXiv Detail & Related papers (2024-11-15T08:36:24Z) - Completed Feature Disentanglement Learning for Multimodal MRIs Analysis [36.32164729310868]
Feature disentanglement (FD)-based methods have achieved significant success in multimodal learning (MML)<n>We propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling.<n>Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs.
arXiv Detail & Related papers (2024-07-06T01:49:38Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition [46.443866373546726]
We focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos.
We propose a joint cross-attention model that relies on the complementary relationships to extract the salient features.
Our proposed A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-28T14:09:43Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.