MedPatch: Confidence-Guided Multi-Stage Fusion for Multimodal Clinical Data
- URL: http://arxiv.org/abs/2508.09182v1
- Date: Thu, 07 Aug 2025 12:46:26 GMT
- Title: MedPatch: Confidence-Guided Multi-Stage Fusion for Multimodal Clinical Data
- Authors: Baraa Al Jorf, Farah Shamout,
- Abstract summary: Real-world medical data is heterogeneous in nature, limited in size, and sparse due to missing modalities.<n>Inspired by clinical prediction tasks, we introduce MedPatch, which seamlessly integrates multiple modalities via confidence-guided patching.<n>We evaluate MedPatch using real-world data consisting of clinical time-series data, chest X-ray images, radiology reports, and discharge notes extracted from the MIMIC-IV, MIMIC-CXR, and MIMIC-Notes datasets.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clinical decision-making relies on the integration of information across various data modalities, such as clinical time-series, medical images and textual reports. Compared to other domains, real-world medical data is heterogeneous in nature, limited in size, and sparse due to missing modalities. This significantly limits model performance in clinical prediction tasks. Inspired by clinical workflows, we introduce MedPatch, a multi-stage multimodal fusion architecture, which seamlessly integrates multiple modalities via confidence-guided patching. MedPatch comprises three main components: (i) a multi-stage fusion strategy that leverages joint and late fusion simultaneously, (ii) a missingness-aware module that handles sparse samples with missing modalities, (iii) a joint fusion module that clusters latent token patches based on calibrated unimodal token-level confidence. We evaluated MedPatch using real-world data consisting of clinical time-series data, chest X-ray images, radiology reports, and discharge notes extracted from the MIMIC-IV, MIMIC-CXR, and MIMIC-Notes datasets on two benchmark tasks, namely in-hospital mortality prediction and clinical condition classification. Compared to existing baselines, MedPatch achieves state-of-the-art performance. Our work highlights the effectiveness of confidence-guided multi-stage fusion in addressing the heterogeneity of multimodal data, and establishes new state-of-the-art benchmark results for clinical prediction tasks.
Related papers
- MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series [13.933658032225317]
We propose MedFuse, a framework for irregular clinical time series based on the MuFuse module.<n>Experiments on three real-world datasets show that MedFuse consistently outperforms state-of-the-art baselines on key predictive tasks.<n>These results establish MedFuse as a generalizable approach for modeling irregular clinical time series.
arXiv Detail & Related papers (2025-11-12T12:10:07Z) - ProbMed: A Probabilistic Framework for Medical Multimodal Binding [21.27709522688514]
We present Probabilistic Modality-Enhanced Diagnosis (ProbMED)<n>ProbMED aligns four distinct modalities--chest X-rays, electrocardiograms, echocardiograms--into a unified probabilistic embedding space.<n>Our model outperforms current medical vision-language pretraining models in cross-modality retrieval, zero-shot, and few-shot classification.
arXiv Detail & Related papers (2025-09-30T03:16:01Z) - MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z) - A Language-Signal-Vision Multimodal Framework for Multitask Cardiac Analysis [37.18952260878238]
Textual Guidance Multimodal fusion for Multiple cardiac tasks (TGMM) was developed.<n>This study systematically explored key features across multiple modalities and elucidated their synergistic contributions in clinical decision-making.
arXiv Detail & Related papers (2025-08-18T16:43:31Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Completed Feature Disentanglement Learning for Multimodal MRIs Analysis [36.32164729310868]
Feature disentanglement (FD)-based methods have achieved significant success in multimodal learning (MML)<n>We propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling.<n>Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs.
arXiv Detail & Related papers (2024-07-06T01:49:38Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Cross-modality Attention-based Multimodal Fusion for Non-small Cell Lung
Cancer (NSCLC) Patient Survival Prediction [0.6476298550949928]
We propose a cross-modality attention-based multimodal fusion pipeline designed to integrate modality-specific knowledge for patient survival prediction in non-small cell lung cancer (NSCLC)
Compared with single modality, which achieved c-index of 0.5772 and 0.5885 using solely tissue image data or RNA-seq data, respectively, the proposed fusion approach achieved c-index 0.6587 in our experiment.
arXiv Detail & Related papers (2023-08-18T21:42:52Z) - MedFuse: Multi-modal fusion with clinical time-series data and chest
X-ray images [3.6615129560354527]
Multi-modal fusion approaches aim to integrate information from different data sources.
Unlike natural datasets, such as in audio-visual applications, data in healthcare is often collected asynchronously.
We propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input.
arXiv Detail & Related papers (2022-07-14T15:59:03Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.