Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets
- URL: http://arxiv.org/abs/2505.16027v1
- Date: Wed, 21 May 2025 21:16:50 GMT
- Title: Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets
- Authors: Qinmei Xu, Yiheng Li, Xianghao Zhan, Ahmet Gorkem Er, Brittany Dashevsky, Chuanjun Xu, Mohammed Alawad, Mengya Yang, Liu Ya, Changsheng Zhou, Xiao Li, Haruka Itakura, Olivier Gevaert,
- Abstract summary: Foundation models leveraging vision-language pretraining have shown promise in chest X-ray (CXR) interpretation.<n>This study benchmarks the diagnostic performance and generalizability of foundation models versus traditional convolutional neural networks (CNNs) on multinational CXR datasets.
- Score: 5.770825110701877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models leveraging vision-language pretraining have shown promise in chest X-ray (CXR) interpretation, yet their real-world performance across diverse populations and diagnostic tasks remains insufficiently evaluated. This study benchmarks the diagnostic performance and generalizability of foundation models versus traditional convolutional neural networks (CNNs) on multinational CXR datasets. We evaluated eight CXR diagnostic models - five vision-language foundation models and three CNN-based architectures - across 37 standardized classification tasks using six public datasets from the USA, Spain, India, and Vietnam, and three private datasets from hospitals in China. Performance was assessed using AUROC, AUPRC, and other metrics across both shared and dataset-specific tasks. Foundation models outperformed CNNs in both accuracy and task coverage. MAVL, a model incorporating knowledge-enhanced prompts and structured supervision, achieved the highest performance on public (mean AUROC: 0.82; AUPRC: 0.32) and private (mean AUROC: 0.95; AUPRC: 0.89) datasets, ranking first in 14 of 37 public and 3 of 4 private tasks. All models showed reduced performance on pediatric cases, with average AUROC dropping from 0.88 +/- 0.18 in adults to 0.57 +/- 0.29 in children (p = 0.0202). These findings highlight the value of structured supervision and prompt design in radiologic AI and suggest future directions including geographic expansion and ensemble modeling for clinical deployment. Code for all evaluated models is available at https://drive.google.com/drive/folders/1B99yMQm7bB4h1sVMIBja0RfUu8gLktCE
Related papers
- A multimodal ensemble approach for clear cell renal cell carcinoma treatment outcome prediction [6.199310532720352]
We developed a multi-modal ensemble model (MMEM) that integrates clinical data, multi-omics data, and histopathology whole slide image (WSI) data.<n>MMEM predicted overall survival (OS) and disease-free survival (DFS) for ccRCC patients.
arXiv Detail & Related papers (2024-12-10T02:51:14Z) - AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets [0.33923727961771083]
Lung cancer remains the leading cause of cancer-related mortality worldwide.<n>With the growing integration of artificial intelligence into medical imaging, the development and evaluation of robust AI models require access to large, well-annotated datasets.<n>We benchmark deep learning models for both 3D nodule detection and lung cancer classification.
arXiv Detail & Related papers (2024-05-07T18:36:40Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - A generalized deep learning model for multi-disease Chest X-Ray
diagnostics [0.0]
We investigate the generalizability of deep convolutional neural network (CNN) on the task of disease classification from chest x-rays collected over multiple sites.
We train the model using datasets from three independent sites with different patient populations.
Our model generalizes better when trained on multiple datasets.
arXiv Detail & Related papers (2020-10-17T18:57:40Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z) - A Systematic Search over Deep Convolutional Neural Network Architectures
for Screening Chest Radiographs [4.6411273009803065]
Chest radiographs are used for the screening of pulmonary and cardio-/thoracic conditions.
Recent efforts demonstrate a performance benchmark using an ensemble of deep convolutional neural networks (CNN)
Our systematic search over multiple standard CNN architectures identified single candidate models whose classification performances were found to be at par with ensembles.
arXiv Detail & Related papers (2020-04-24T12:30:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.