Related papers: Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets

Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets

URL: http://arxiv.org/abs/2512.06014v1
Date: Wed, 03 Dec 2025 12:55:44 GMT
Title: Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets
Authors: Jiho Shin, Dominic Marshall, Matthieu Komorowski,
Abstract summary: This work benchmarks two large-scale chest X-ray embedding models (CXR) on public MIMIC-CR and NIH ChestX-ray14 datasets.<n>We extracted embeddings directly from pre-trained encoders, trained lightweight LightGBM classifiers on multiple disease labels, and reported mean AUROC, and F1-score with 95% confidence intervals.
Score: 0.35441912284181126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent foundation models have demonstrated strong performance in medical image representation learning, yet their comparative behaviour across datasets remains underexplored. This work benchmarks two large-scale chest X-ray (CXR) embedding models (CXR-Foundation (ELIXR v2.0) and MedImagelnsight) on public MIMIC-CR and NIH ChestX-ray14 datasets. Each model was evaluated using a unified preprocessing pipeline and fixed downstream classifiers to ensure reproducible comparison. We extracted embeddings directly from pre-trained encoders, trained lightweight LightGBM classifiers on multiple disease labels, and reported mean AUROC, and F1-score with 95% confidence intervals. MedImageInsight achieved slightly higher performance across most tasks, while CXR-Foundation exhibited strong cross-dataset stability. Unsupervised clustering of MedImageIn-sight embeddings further revealed a coherent disease-specific structure consistent with quantitative results. The results highlight the need for standardised evaluation of medical foundation models and establish reproducible baselines for future multimodal and clinical integration studies.

Related papers

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z)
Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z)
Fairness and Robustness of CLIP-Based Models for Chest X-rays [9.082174810187931]
We extensively evaluate six widely used CLIP-based models on chest X-ray classification using three publicly available datasets.<n>We assess the models fairness across six conditions and patient subgroups based on age, sex, and race.<n>Our results indicate performance gaps between patients of different ages, but more equitable results for the other attributes.
arXiv Detail & Related papers (2025-07-28T19:25:16Z)
Chest X-ray Foundation Model with Global and Local Representations Integration [13.736829173377355]
CheXFound is a vision foundation model that learns robust CXR representations and generalizes effectively across a wide range of downstream tasks.<n>We pretrain CheXFound on a curated CXR-1M dataset, comprising over one million unique CXRs from publicly available sources.<n>Our experimental results show that CheXFound outperforms state-of-the-art models in classifying 40 disease findings across different prevalence levels.
arXiv Detail & Related papers (2025-02-07T18:16:15Z)
LTCXNet: Advancing Chest X-Ray Analysis with Solutions for Long-Tailed Multi-Label Classification and Fairness Challenges [4.351007758390175]
Pruned MIMIC-CXR-LT dataset is designed to represent a long-tailed and multi-label data scenario. We introduce LTCXNet, a novel framework that integrates the ConvNeXt model, ML-Decoder, and strategic data augmentation.
arXiv Detail & Related papers (2024-11-16T08:59:20Z)
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset [14.911363203907008]
X-ray image-based medical report generation can significantly reduce diagnostic burdens and patient wait times. We conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs) on the CheXpert Plus dataset. We propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning.
arXiv Detail & Related papers (2024-10-01T04:07:01Z)
Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images [3.6615129560354527]
Multi-modal fusion approaches aim to integrate information from different data sources. Unlike natural datasets, such as in audio-visual applications, data in healthcare is often collected asynchronously. We propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input.
arXiv Detail & Related papers (2022-07-14T15:59:03Z)
Improving Classification Model Performance on Chest X-Rays through Lung Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations. Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.