DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation
- URL: http://arxiv.org/abs/2506.11691v1
- Date: Fri, 13 Jun 2025 11:38:18 GMT
- Title: DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation
- Authors: Libin Lan, Hongxing Li, Zunhui Xia, Yudong Zhang,
- Abstract summary: We propose a novel model, named Dynamic Modality-Aware Fusion Network (DMAF-Net)<n>First, it introduces a Dynamic Modality-Aware Fusion (DMAF) module to suppress missing-modality interference.<n>Second, it designs a synergistic Relation Distillation and Prototype Distillation framework to enforce global-local feature alignment.<n>Third, it presents a Dynamic Training Monitoring (DTM) strategy to stabilize optimization under imbalanced missing rates.
- Score: 7.441945494253697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incomplete multi-modal medical image segmentation faces critical challenges from modality imbalance, including imbalanced modality missing rates and heterogeneous modality contributions. Due to their reliance on idealized assumptions of complete modality availability, existing methods fail to dynamically balance contributions and neglect the structural relationships between modalities, resulting in suboptimal performance in real-world clinical scenarios. To address these limitations, we propose a novel model, named Dynamic Modality-Aware Fusion Network (DMAF-Net). The DMAF-Net adopts three key ideas. First, it introduces a Dynamic Modality-Aware Fusion (DMAF) module to suppress missing-modality interference by combining transformer attention with adaptive masking and weight modality contributions dynamically through attention maps. Second, it designs a synergistic Relation Distillation and Prototype Distillation framework to enforce global-local feature alignment via covariance consistency and masked graph attention, while ensuring semantic consistency through cross-modal class-specific prototype alignment. Third, it presents a Dynamic Training Monitoring (DTM) strategy to stabilize optimization under imbalanced missing rates by tracking distillation gaps in real-time, and to balance convergence speeds across modalities by adaptively reweighting losses and scaling gradients. Extensive experiments on BraTS2020 and MyoPS2020 demonstrate that DMAF-Net outperforms existing methods for incomplete multi-modal medical image segmentation. Extensive experiments on BraTS2020 and MyoPS2020 demonstrate that DMAF-Net outperforms existing methods for incomplete multi-modal medical image segmentation. Our code is available at https://github.com/violet-42/DMAF-Net.
Related papers
- RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation [0.624829068285122]
We propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment.<n>The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module.<n> Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$2$Net outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T16:12:06Z) - Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport [28.96009174108652]
Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography ( OCT) to provide a comprehensive view of ocular pathologies.<n>Existing commonly used pipelines, such as modality imputation and distillation methods, face notable limitations.<n>We propose a novel multimodal alignment and fusion framework capable of robustly handling missing modalities in the task of ophthalmic diagnostics.
arXiv Detail & Related papers (2025-07-07T13:36:39Z) - MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction [5.895727565919295]
This paper presents a Multimodal Representation Decoupling Network (MurreNet) to advance cancer survival analysis.<n>MurreNet decomposes paired input data into modality-specific and modality-shared representations, thereby reducing redundancy between modalities.<n>Experiments conducted on six TCGA cancer cohorts demonstrate that our MurreNet achieves state-of-the-art (SOTA) performance in survival prediction.
arXiv Detail & Related papers (2025-07-07T11:26:29Z) - MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z) - Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency [0.0]
We propose Dynamic Modality Scheduling (DMS), a novel framework that adaptively adjusts the contribution of each modality at a per-sample level.<n> Experimental results on VQA, image-text retrieval, and captioning tasks show that DMS significantly improves both clean and robust performance.
arXiv Detail & Related papers (2025-06-15T05:15:52Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.<n>Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.<n>We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Gradient-Guided Modality Decoupling for Missing-Modality Robustness [24.95911972867697]
We introduce a novel indicator, gradients, to monitor and reduce modality dominance.
We present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities.
In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing framework.
arXiv Detail & Related papers (2024-02-26T05:50:43Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Efficient Multimodal Transformer with Dual-Level Feature Restoration for
Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
Despite significant progress, there are still two major challenges on the way towards robust MSA.
We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.