Related papers: Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning

Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning

URL: http://arxiv.org/abs/2407.04449v1
Date: Fri, 5 Jul 2024 12:04:12 GMT
Title: Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning
Authors: Saeed Shurrab, Alejandro Guerra-Manzanares, Farah E. Shamout,
Abstract summary: We propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest X-ray representations. Our work highlights the potential of EHR-enhanced self-supervised pre-training for medical imaging.
Score: 46.674521557701816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. While such approaches deliver promising results, they do not leverage associated patient or scan information collected within Electronic Health Records (EHR). Here, we propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest X-ray representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate our approach on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations via linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and state-of-the-art self-supervised learning baselines. Our work highlights the potential of EHR-enhanced self-supervised pre-training for medical imaging. The code is publicly available at: https://github.com/nyuad-cai/CXR-EHR-MSN

Related papers

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data [0.6183104361749774]
Current clinical practice often relies on diagnostic biomarkers in QSM and NM-MRI images.<n>We address these challenges by leveraging 2D vision foundation models (VFMs)<n>Our approach achieved first place in the MICCAI 2025 PDCADxFoundation challenge, with an accuracy of 86.4% trained on a dataset of only 300 labeled QSM and NM-MRI scans.
arXiv Detail & Related papers (2025-09-22T10:59:27Z)
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision [24.846428105192405]
We train M3Ret, a unified visual encoder, without any modality-specific customization.<n>It successfully learns transferable representations using both generative (MAE) and contrastive (SimDINO) self-supervised learning (SSL) paradigms.<n>Our approach sets a new state-of-the-art in zero-shot image-to-image retrieval across all individual modalities, surpassing strong baselines such as DINOv3 and the text-supervised BMC-CLIP.
arXiv Detail & Related papers (2025-09-01T10:59:39Z)
Mitigating Catastrophic Forgetting in the Incremental Learning of Medical Images [1.1510009152620668]
This paper proposes an Incremental Learning (IL) approach to enhance the accuracy and efficiency of deep learning models in analyzing T2-weighted (T2w) MRI medical images prostate cancer detection. We used multiple health centers' artificial intelligence and radiology data, focused on different tasks that looked at prostate cancer detection using MRI (PI-CAI) We utilized Knowledge Distillation (KD) as it employs generated images from past tasks to guide the training of models for subsequent tasks.
arXiv Detail & Related papers (2025-04-28T17:56:04Z)
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation [15.257119888131609]
We propose enhanced contrastive learning with Multi-view Longitudinal data to facilitate chest X-ray Report Generation, named MLRG. Specifically, we introduce a multi-view longitudinal contrast learning method that integrates spatial information from current multi-view images and temporal information from longitudinal data. We present a tokenized absence encoding technique to handle missing patient-specific prior knowledge, allowing the model to produce more accurate radiology reports based on available prior knowledge.
arXiv Detail & Related papers (2025-02-27T12:59:04Z)
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis [1.2903829793534272]
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions. Efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records. This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG) Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines.
arXiv Detail & Related papers (2024-03-22T19:19:51Z)
MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning [6.4136876268620115]
MLVICX is an approach to capture rich representations in the form of embeddings from chest X-ray images. We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning.
arXiv Detail & Related papers (2024-03-18T06:19:37Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images [1.883452979588382]
Self-supervised learning (SSL) can be applied to chest radiographs to learn robust features. We tested our approach on over 800,000 chest radiographs from six large global datasets.
arXiv Detail & Related papers (2023-08-15T10:37:13Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis [7.214195462426705]
We introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets.
arXiv Detail & Related papers (2021-11-29T18:45:20Z)
Variational Knowledge Distillation for Disease Classification in Chest X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time. We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.