MedFuse: Multi-modal fusion with clinical time-series data and chest
X-ray images
- URL: http://arxiv.org/abs/2207.07027v1
- Date: Thu, 14 Jul 2022 15:59:03 GMT
- Title: MedFuse: Multi-modal fusion with clinical time-series data and chest
X-ray images
- Authors: Nasir Hayat, Krzysztof J. Geras, Farah E. Shamout
- Abstract summary: Multi-modal fusion approaches aim to integrate information from different data sources.
Unlike natural datasets, such as in audio-visual applications, data in healthcare is often collected asynchronously.
We propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input.
- Score: 3.6615129560354527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal fusion approaches aim to integrate information from different
data sources. Unlike natural datasets, such as in audio-visual applications,
where samples consist of "paired" modalities, data in healthcare is often
collected asynchronously. Hence, requiring the presence of all modalities for a
given sample is not realistic for clinical tasks and significantly limits the
size of the dataset during training. In this paper, we propose MedFuse, a
conceptually simple yet promising LSTM-based fusion module that can accommodate
uni-modal as well as multi-modal input. We evaluate the fusion method and
introduce new benchmark results for in-hospital mortality prediction and
phenotype classification, using clinical time-series data in the MIMIC-IV
dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more
complex multi-modal fusion strategies, MedFuse provides a performance
improvement by a large margin on the fully paired test set. It also remains
robust across the partially paired test set containing samples with missing
chest X-ray images. We release our code for reproducibility and to enable the
evaluation of competing models in the future.
Related papers
- Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets [0.35441912284181126]
This work benchmarks two large-scale chest X-ray embedding models (CXR) on public MIMIC-CR and NIH ChestX-ray14 datasets.<n>We extracted embeddings directly from pre-trained encoders, trained lightweight LightGBM classifiers on multiple disease labels, and reported mean AUROC, and F1-score with 95% confidence intervals.
arXiv Detail & Related papers (2025-12-03T12:55:44Z) - MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series [13.933658032225317]
We propose MedFuse, a framework for irregular clinical time series based on the MuFuse module.<n>Experiments on three real-world datasets show that MedFuse consistently outperforms state-of-the-art baselines on key predictive tasks.<n>These results establish MedFuse as a generalizable approach for modeling irregular clinical time series.
arXiv Detail & Related papers (2025-11-12T12:10:07Z) - ProbMed: A Probabilistic Framework for Medical Multimodal Binding [21.27709522688514]
We present Probabilistic Modality-Enhanced Diagnosis (ProbMED)<n>ProbMED aligns four distinct modalities--chest X-rays, electrocardiograms, echocardiograms--into a unified probabilistic embedding space.<n>Our model outperforms current medical vision-language pretraining models in cross-modality retrieval, zero-shot, and few-shot classification.
arXiv Detail & Related papers (2025-09-30T03:16:01Z) - Multimodal Medical Image Classification via Synergistic Learning Pre-training [20.818508328120974]
We propose a novel framework for multimodal semi-supervised medical image classification.<n>By treating one modality as an augmented sample of another modality, we implement a self-supervised learning pre-train.<n>During the fine-tuning stage, we set different encoders to extract features from the original modalities.
arXiv Detail & Related papers (2025-09-22T08:21:19Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - MedPatch: Confidence-Guided Multi-Stage Fusion for Multimodal Clinical Data [0.46040036610482665]
Real-world medical data is heterogeneous in nature, limited in size, and sparse due to missing modalities.<n>Inspired by clinical prediction tasks, we introduce MedPatch, which seamlessly integrates multiple modalities via confidence-guided patching.<n>We evaluate MedPatch using real-world data consisting of clinical time-series data, chest X-ray images, radiology reports, and discharge notes extracted from the MIMIC-IV, MIMIC-CXR, and MIMIC-Notes datasets.
arXiv Detail & Related papers (2025-08-07T12:46:26Z) - Cross-Sequence Semi-Supervised Learning for Multi-Parametric MRI-Based Visual Pathway Delineation [18.101169568060786]
We propose a novel semi-supervised multi-parametric feature decomposition framework for VP delineation.<n>Specifically, a correlation-constrained feature decomposition (CFD) is designed to handle the complex cross-sequence relationships.<n>We validate our framework using two public datasets, and one in-house Multi-Shell Diffusion MRI (MDM) dataset.
arXiv Detail & Related papers (2025-05-26T09:18:58Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.
Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.
Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.
Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation [22.908801443059758]
We present MedCoDi-M, a model for multimodal medical data generation.
We benchmark it against five competitors on the MIMIC-CXR dataset.
We assess the utility of MedCoDi-M in addressing key challenges in the medical field.
arXiv Detail & Related papers (2025-01-08T16:53:56Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models [11.798375238713488]
MEDFuse is a framework that integrates structured and unstructured medical data.
It achieves over 90% F1 score in the 10-disease multi-label classification task.
arXiv Detail & Related papers (2024-07-17T04:17:09Z) - FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information.
Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis.
The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features.
We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z) - MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis [1.2903829793534272]
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions.
Efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records.
This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG)
Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines.
arXiv Detail & Related papers (2024-03-22T19:19:51Z) - Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities [9.476402318365446]
In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions.
We propose a solution by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL.
arXiv Detail & Related papers (2024-01-07T23:45:01Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - A Novel Unified Conditional Score-based Generative Framework for
Multi-modal Medical Image Completion [54.512440195060584]
We propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM)
UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions.
Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions.
arXiv Detail & Related papers (2022-07-07T16:57:21Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.