Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture
- URL: http://arxiv.org/abs/2509.15470v1
- Date: Thu, 18 Sep 2025 22:35:44 GMT
- Title: Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture
- Authors: Thomas Z. Li, Aravind R. Krishnan, Lianrui Zuo, John M. Still, Kim L. Sandler, Fabien Maldonado, Thomas A. Lasko, Bennett A. Landman,
- Abstract summary: multimodal models for pulmonary nodule diagnosis are limited by the scarcity of labeled data and the tendency for these models to overfit on the training distribution.<n>We leverage self-supervised learning from longitudinal and multimodal archives to address these challenges.
- Score: 2.2996127475972696
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The development of multimodal models for pulmonary nodule diagnosis is limited by the scarcity of labeled data and the tendency for these models to overfit on the training distribution. In this work, we leverage self-supervised learning from longitudinal and multimodal archives to address these challenges. We curate an unlabeled set of patients with CT scans and linked electronic health records from our home institution to power joint embedding predictive architecture (JEPA) pretraining. After supervised finetuning, we show that our approach outperforms an unregularized multimodal model and imaging-only model in an internal cohort (ours: 0.91, multimodal: 0.88, imaging-only: 0.73 AUC), but underperforms in an external cohort (ours: 0.72, imaging-only: 0.75 AUC). We develop a synthetic environment that characterizes the context in which JEPA may underperform. This work innovates an approach that leverages unlabeled multimodal medical archives to improve predictive models and demonstrates its advantages and limitations in pulmonary nodule diagnosis.
Related papers
- Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations [4.505150709006532]
Foundation models hold promise for specialized medical imaging tasks, though their effectiveness in breast imaging remains underexplored.<n>This study leverages BiomedCLIP as a foundation model to address challenges in model generalization.<n>Using 96,995 images, we compared single-modality (s2D only) and multi-modality training approaches, addressing class imbalance through weighted contrastive learning.
arXiv Detail & Related papers (2025-11-21T22:45:50Z) - Integrating Genomics into Multimodal EHR Foundation Models [56.31910745104141]
This paper introduces an innovative EHR foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality.<n>The framework aims to learn complex relationships between clinical data and genetic predispositions.<n>This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies.
arXiv Detail & Related papers (2025-10-24T15:56:40Z) - MM-DINOv2: Adapting Foundation Models for Multi-Modal Medical Image Analysis [19.063517827476826]
We introduce MM-DINOv2, a novel framework that adapts the pre-trained vision foundation model DINOv2 for multi-modal medical imaging.<n>Our approach incorporates multi-modal patch embeddings, enabling vision foundation models to effectively process multi-modal imaging data.<n>Our method achieves a Matthews Correlation Coefficient (MCC) of 0.6 on an external test set, surpassing state-of-the-art supervised approaches by +11.1%.
arXiv Detail & Related papers (2025-09-08T12:34:15Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - COMPRER: A Multimodal Multi-Objective Pretraining Framework for Enhanced Medical Image Representation [1.5749416770494706]
COMPRER is a novel multi-modal, multi-objective pretraining framework.
It enhances medical-image representation, diagnostic inferences, and prognosis of diseases.
arXiv Detail & Related papers (2024-02-04T08:05:58Z) - End-To-End Prediction of Knee Osteoarthritis Progression With
Multi-Modal Transformers [2.9822184411723645]
Knee Osteoarthritis (KOA) is a highly prevalent chronic musculoskeletal condition with no currently available treatment.
We leveraged recent advances in Deep Learning and developed a unified framework for the multi-modal fusion of knee imaging data.
Our follow-up analysis generally shows that prediction from the imaging data is more accurate for post-traumatic subjects.
arXiv Detail & Related papers (2023-07-03T09:10:57Z) - Longitudinal Multimodal Transformer Integrating Imaging and Latent
Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification [4.002181247287472]
We propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from EHRs for solitary pulmonary nodule (SPN) classification.
We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans.
arXiv Detail & Related papers (2023-04-06T03:03:07Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.