Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
- URL: http://arxiv.org/abs/2503.09498v1
- Date: Wed, 12 Mar 2025 16:03:00 GMT
- Title: Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
- Authors: Nazanin Moradinasab, Saurav Sengupta, Jiebei Liu, Sana Syed, Donald E. Brown,
- Abstract summary: We propose a new multi-model model called MoSARe, a deep learning framework that handles incomplete multimodal data.<n>MoSARe integrates expert selection, cross-modal attention, and contrastive learning to improve feature representation and decision-making.<n>It provides reliable predictions even when some data are missing.
- Score: 0.8213829427624407
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Healthcare relies on multiple types of data, such as medical images, genetic information, and clinical records, to improve diagnosis and treatment. However, missing data is a common challenge due to privacy restrictions, cost, and technical issues, making many existing multi-modal models unreliable. To address this, we propose a new multi-model model called Mixture of Experts, Symmetric Aligning, and Reconstruction (MoSARe), a deep learning framework that handles incomplete multimodal data while maintaining high accuracy. MoSARe integrates expert selection, cross-modal attention, and contrastive learning to improve feature representation and decision-making. Our results show that MoSARe outperforms existing models in situations when the data is complete. Furthermore, it provides reliable predictions even when some data are missing. This makes it especially useful in real-world healthcare settings, including resource-limited environments. Our code is publicly available at https://github.com/NazaninMn/MoSARe.
Related papers
- What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning Methods [0.13194391758295113]
We present a method that measures the importance of each modality in a dataset for the model to fulfill its task.
We found that some networks have modality preferences that tend to unimodal collapses, while some datasets are imbalanced from the ground up.
With our method we make a crucial contribution to the field of interpretability in deep learning based multimodal research.
arXiv Detail & Related papers (2025-02-28T12:39:39Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation [22.908801443059758]
We present MedCoDi-M, a model for multimodal medical data generation.<n>We benchmark it against five competitors on the MIMIC-CXR dataset.<n>We assess the utility of MedCoDi-M in addressing key challenges in the medical field.
arXiv Detail & Related papers (2025-01-08T16:53:56Z) - DRIM: Learning Disentangled Representations from Incomplete Multimodal Healthcare Data [0.0]
Real-life medical data is often multimodal and incomplete, fueling the need for advanced deep learning models.
We introduce DRIM, a new method for capturing shared and unique representations, despite data sparsity.
Our method outperforms state-of-the-art algorithms on glioma patients survival prediction tasks, while being robust to missing modalities.
arXiv Detail & Related papers (2024-09-25T16:13:57Z) - Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease
Classification with Incomplete Data [8.536869574065195]
Multi-Modal Mixing Transformer (3MAT) is a disease classification transformer that not only leverages multi-modal data but also handles missing data scenarios.
We propose a novel modality dropout mechanism to ensure an unprecedented level of modality independence and robustness to handle missing data scenarios.
arXiv Detail & Related papers (2022-10-01T11:31:02Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.