Related papers: Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

URL: http://arxiv.org/abs/2509.18284v1
Date: Mon, 22 Sep 2025 18:12:12 GMT
Title: Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
Authors: Yi Gu, Kuniaki Saito, Jiaxin Ma,
Abstract summary: We propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning.<n>We validate our framework on large-scale clinical datasets for disease detection and prediction tasks.<n>Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning.
Score: 17.717216490402482
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As medical diagnoses increasingly leverage multimodal data, machine learning models are expected to effectively fuse heterogeneous information while remaining robust to missing modalities. In this work, we propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning to address real-world limitations such as modality imbalance and missingness. Our approach introduces learnable modality tokens for improving missingness-aware fusion of modalities and augments conventional unimodal contrastive objectives with fused multimodal representations. We validate our framework on large-scale clinical datasets for disease detection and prediction tasks, encompassing both visual and tabular modalities. Experimental results demonstrate that our method achieves state-of-the-art performance, particularly in challenging and practical scenarios where only a single modality is available. Furthermore, we show its adaptability through successful integration with a recent CT foundation model. Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning, offering a scalable, low-cost solution with significant potential for real-world clinical applications. The code is available at https://github.com/omron-sinicx/medical-modality-dropout.

Related papers

Seeking Necessary and Sufficient Information from Multimodal Medical Data [25.069100836193574]
multimodal models overlook learning features that are both necessary (must be present for the outcome to occur) and sufficient (enough to determine the outcome)<n>We argue learning such features is crucial as they can improve model performance by capturing essential predictive information.<n>Experiments on synthetic and real-world medical datasets demonstrate our method's effectiveness.
arXiv Detail & Related papers (2026-02-27T20:15:36Z)
MM-DINOv2: Adapting Foundation Models for Multi-Modal Medical Image Analysis [19.063517827476826]
We introduce MM-DINOv2, a novel framework that adapts the pre-trained vision foundation model DINOv2 for multi-modal medical imaging.<n>Our approach incorporates multi-modal patch embeddings, enabling vision foundation models to effectively process multi-modal imaging data.<n>Our method achieves a Matthews Correlation Coefficient (MCC) of 0.6 on an external test set, surpassing state-of-the-art supervised approaches by +11.1%.
arXiv Detail & Related papers (2025-09-08T12:34:15Z)
impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z)
What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning Methods [0.13194391758295113]
We present a method that measures the importance of each modality in a dataset for the model to fulfill its task.<n>We found that some networks have modality preferences that tend to unimodal collapses, while some datasets are imbalanced from the ground up.<n>With our method we make a crucial contribution to the field of interpretability in deep learning based multimodal research.
arXiv Detail & Related papers (2025-02-28T12:39:39Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information. Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z)
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency [18.291267748113142]
We propose DrFuse to achieve effective clinical multi-modal fusion. We address the missing modality issue by disentangling the features shared across modalities and those unique within each modality. We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR.
arXiv Detail & Related papers (2024-03-10T12:41:34Z)
Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis [7.207158973042472]
Multi-modal data, whole slide images (WSIs) and clinical information, can improve the performance of deep learning models in the diagnosis of axillary lymph node metastasis. We propose a bidirectional distillation framework consisting of a multi-modal branch and a single-modal branch. Our approach achieves state-of-the-art performance with an AUC of 0.861 on the test set without missing data, but also yields an AUC of 0.842 when the rate of missing modality is 80%.
arXiv Detail & Related papers (2024-01-03T05:59:48Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Reliable Multimodality Eye Disease Screening via Mixture of Student's t Distributions [49.4545260500952]
We introduce a novel multimodality evidential fusion pipeline for eye disease screening, EyeMoSt. Our model estimates both local uncertainty for unimodality and global uncertainty for the fusion modality to produce reliable classification results. Our experimental findings on both public and in-house datasets show that our model is more reliable than current methods.
arXiv Detail & Related papers (2023-03-17T06:18:16Z)
Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion [71.87627318863612]
We propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset.
arXiv Detail & Related papers (2020-02-22T14:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.