MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data
- URL: http://arxiv.org/abs/2510.27321v1
- Date: Fri, 31 Oct 2025 09:47:58 GMT
- Title: MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data
- Authors: Yu-Chen Kuo, Yi-Ju Tseng,
- Abstract summary: MedM2T is a time-aware multimodal framework designed to address medical data modeling challenges.<n>We evaluate MedM2T on MIMIC-IV and MIMIC-IV-ECG datasets for three tasks that encompass chronic and acute disease dynamics.<n>These results highlight the robustness and broad applicability of MedM2T, positioning it as a promising tool in clinical prediction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The inherent multimodality and heterogeneous temporal structures of medical data pose significant challenges for modeling. We propose MedM2T, a time-aware multimodal framework designed to address these complexities. MedM2T integrates: (i) Sparse Time Series Encoder to flexibly handle irregular and sparse time series, (ii) Hierarchical Time-Aware Fusion to capture both micro- and macro-temporal patterns from multiple dense time series, such as ECGs, and (iii) Bi-Modal Attention to extract cross-modal interactions, which can be extended to any number of modalities. To mitigate granularity gaps between modalities, MedM2T uses modality-specific pre-trained encoders and aligns resulting features within a shared encoder. We evaluated MedM2T on MIMIC-IV and MIMIC-IV-ECG datasets for three tasks that encompass chronic and acute disease dynamics: 90-day cardiovascular disease (CVD) prediction, in-hospital mortality prediction, and ICU length-of-stay (LOS) regression. MedM2T outperformed state-of-the-art multimodal learning frameworks and existing time series models, achieving an AUROC of 0.947 and an AUPRC of 0.706 for CVD prediction; an AUROC of 0.901 and an AUPRC of 0.558 for mortality prediction; and Mean Absolute Error (MAE) of 2.31 for LOS regression. These results highlight the robustness and broad applicability of MedM2T, positioning it as a promising tool in clinical prediction. We provide the implementation of MedM2T at https://github.com/DHLab-TSENG/MedM2T.
Related papers
- A Multimodal Deep Learning Framework for Predicting ICU Deterioration: Integrating ECG Waveforms with Clinical Data and Clinician Benchmarking [1.2938481923855734]
MDS ICU is a unified machine learning framework that fuses routinely collected data including demographics, biometrics, vital signs, laboratory values, ECG waveforms, surgical procedures, and medical device usage to provide continuous predictive support during ICU stays.<n>We trained a deep learning architecture that combines structured state space S4 encoders for ECG waveforms with multilayer perceptron RealMLP encoders for tabular data to jointly predict 33 clinically relevant outcomes spanning mortality, organ dysfunction, medication needs, and acute deterioration.<n>Analysis showed close agreement between predicted and observed risks, with consistent gains from ECG waveform integration.
arXiv Detail & Related papers (2026-01-10T18:11:12Z) - Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution [42.85462513661566]
We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay.<n>A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes.<n>On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model while maintaining well-calibrated predictions.
arXiv Detail & Related papers (2025-11-19T20:11:49Z) - MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series [13.933658032225317]
We propose MedFuse, a framework for irregular clinical time series based on the MuFuse module.<n>Experiments on three real-world datasets show that MedFuse consistently outperforms state-of-the-art baselines on key predictive tasks.<n>These results establish MedFuse as a generalizable approach for modeling irregular clinical time series.
arXiv Detail & Related papers (2025-11-12T12:10:07Z) - Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment [16.824692012617334]
$textttDiPro$ is a novel framework that addresses challenges through region-aware disentanglement and multi-timescale alignment.<n>Experiments on the MIMIC dataset demonstrate that $textttDiPro$ could effectively extract temporal clinical dynamics.
arXiv Detail & Related papers (2025-10-13T08:02:36Z) - Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models [6.169451756799087]
There is an urgent need for a unified multimodal AI framework that can proactively predict individual health risks.<n>We propose VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer with a large language model (LLM) inference head embedded in its top layer.
arXiv Detail & Related papers (2025-09-22T05:26:59Z) - MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z) - MedNet-PVS: A MedNeXt-Based Deep Learning Model for Automated Segmentation of Perivascular Spaces [7.639673588124668]
Enlarged perivascular spaces (PVS) are recognized as biomarkers of cerebral small vessel disease, Alzheimer's disease, stroke, and aging-related neurodegeneration.<n>We adapted MedNeXt-L-k5, a Transformer-inspired 3D encoder-decoder convolutional network, for automated PVS segmentation.
arXiv Detail & Related papers (2025-08-27T20:24:12Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Explainable Spatio-Temporal GCNNs for Irregular Multivariate Time Series: Architecture and Application to ICU Patient Data [7.433698348783128]
We present XST-CNN (eXG-Temporal Graph Conal Neural Network), a novel architecture for processing heterogeneous and irregular Multi Time Series (MTS) data.
Our approach captures temporal and feature within a unifiedtemporal-temporal pipeline by leveraging a GCNN pipeline.
We evaluate XST-CNN using real-world Electronic Health Record data to predict Multidrug Resistance (MDR) in ICU patients.
arXiv Detail & Related papers (2024-11-01T22:53:17Z) - CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis [46.56667527672019]
We introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data.<n>Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings.
arXiv Detail & Related papers (2024-11-01T15:54:07Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma [4.578027879885667]
This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model.
The model employs self-supervised learning techniques to effectively encode the high-dimensional MRI input for integration with non-imaging data using cross-attention.
arXiv Detail & Related papers (2024-05-21T17:44:48Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.