ProbMed: A Probabilistic Framework for Medical Multimodal Binding
- URL: http://arxiv.org/abs/2509.25711v1
- Date: Tue, 30 Sep 2025 03:16:01 GMT
- Title: ProbMed: A Probabilistic Framework for Medical Multimodal Binding
- Authors: Yuan Gao, Sangwook Kim, Jianzhong You, Chris McIntosh,
- Abstract summary: We present Probabilistic Modality-Enhanced Diagnosis (ProbMED)<n>ProbMED aligns four distinct modalities--chest X-rays, electrocardiograms, echocardiograms--into a unified probabilistic embedding space.<n>Our model outperforms current medical vision-language pretraining models in cross-modality retrieval, zero-shot, and few-shot classification.
- Score: 21.27709522688514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical decision-making requires integrating diverse medical information, from imaging to clinical narratives. These medical modalities are often acquired in a many-to-many manner. However, current medical vision-language pretraining models (Med-VLPMs) fail to directly account for this many-to-many mapping in their model training and embeddings. To address this, we present Probabilistic Modality-Enhanced Diagnosis (ProbMED), a multimodal Med-VLPM that employs probabilistic contrastive learning to model distributions over embeddings rather than deterministic estimates. ProbMED aligns four distinct modalities--chest X-rays, electrocardiograms, echocardiograms, and clinical text--into a unified probabilistic embedding space. We use InfoNCE loss with Hellinger distance to integrate inter-modality distributions. We introduce a probabilistic synthetic sampling loss that captures modality-specific mean and variance to improve intra-modality binding. Extensive experiments across 13 medical datasets demonstrate that our model outperforms current Med-VLPMs in cross-modality retrieval, zero-shot, and few-shot classification. We also demonstrate the robust integration of multiple modalities for prognostication, showing improved intra- and inter-medical modality binding.
Related papers
- MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata [5.963599483233974]
We introduce MetaVoxel, a generative joint diffusion modeling framework.<n>We show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction.
arXiv Detail & Related papers (2025-12-10T19:47:52Z) - Cross-Modal Alignment via Variational Copula Modelling [54.25504956780864]
It is essential to develop multimodal learning methods to aggregate various information from multiple modalities.<n>Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities.<n>We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities.
arXiv Detail & Related papers (2025-11-05T05:28:28Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - CLIMD: A Curriculum Learning Framework for Imbalanced Multimodal Diagnosis [21.001994821490644]
We propose a Curriculum Learning framework for Imbalanced Multimodal Diagnosis (CLIMD)<n>Specifically, we first design multimodal curriculum measurer that combines intra-modal confidence and inter-modal complementarity, to enable the model to focus on key samples.<n>As a plug-and-play CL framework, CLIMD can be easily integrated into other models, offering a promising path for improving multimodal disease diagnosis accuracy.
arXiv Detail & Related papers (2025-08-03T05:25:12Z) - Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications [10.096013178241117]
Multimodal large language models (MLLMs) can process and integrate information from multimodality sources, such as text and images.<n> uncertainties due to individual uni-modal data and potential clinical applications are yet fully understood.<n>We propose a multimodal uncertainty propagation model (MUPM) based on uncertainty propagation.
arXiv Detail & Related papers (2025-07-17T09:34:21Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Cross-modality Attention-based Multimodal Fusion for Non-small Cell Lung
Cancer (NSCLC) Patient Survival Prediction [0.6476298550949928]
We propose a cross-modality attention-based multimodal fusion pipeline designed to integrate modality-specific knowledge for patient survival prediction in non-small cell lung cancer (NSCLC)
Compared with single modality, which achieved c-index of 0.5772 and 0.5885 using solely tissue image data or RNA-seq data, respectively, the proposed fusion approach achieved c-index 0.6587 in our experiment.
arXiv Detail & Related papers (2023-08-18T21:42:52Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.