Related papers: Fairness and Robustness of CLIP-Based Models for Chest X-rays

Fairness and Robustness of CLIP-Based Models for Chest X-rays

URL: http://arxiv.org/abs/2507.21291v1
Date: Mon, 28 Jul 2025 19:25:16 GMT
Title: Fairness and Robustness of CLIP-Based Models for Chest X-rays
Authors: Théo Sourget, David Restrepo, Céline Hudelot, Enzo Ferrante, Stergios Christodoulidis, Maria Vakalopoulou,
Abstract summary: We extensively evaluate six widely used CLIP-based models on chest X-ray classification using three publicly available datasets.<n>We assess the models fairness across six conditions and patient subgroups based on age, sex, and race.<n>Our results indicate performance gaps between patients of different ages, but more equitable results for the other attributes.
Score: 9.082174810187931
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Motivated by the strong performance of CLIP-based models in natural image-text domains, recent efforts have adapted these architectures to medical tasks, particularly in radiology, where large paired datasets of images and reports, such as chest X-rays, are available. While these models have shown encouraging results in terms of accuracy and discriminative performance, their fairness and robustness in the different clinical tasks remain largely underexplored. In this study, we extensively evaluate six widely used CLIP-based models on chest X-ray classification using three publicly available datasets: MIMIC-CXR, NIH-CXR14, and NEATX. We assess the models fairness across six conditions and patient subgroups based on age, sex, and race. Additionally, we assess the robustness to shortcut learning by evaluating performance on pneumothorax cases with and without chest drains. Our results indicate performance gaps between patients of different ages, but more equitable results for the other attributes. Moreover, all models exhibit lower performance on images without chest drains, suggesting reliance on spurious correlations. We further complement the performance analysis with a study of the embeddings generated by the models. While the sensitive attributes could be classified from the embeddings, we do not see such patterns using PCA, showing the limitations of these visualisation techniques when assessing models. Our code is available at https://github.com/TheoSourget/clip_cxr_fairness

Related papers

Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification [15.98427699337596]
We perform a comprehensive fairness analysis of CLIP-like models applied to X-ray image classification.<n>We assess their performance and fairness across diverse patient demographics and disease categories using zero-shot inference and various fine-tuning techniques.
arXiv Detail & Related papers (2025-01-31T12:23:50Z)
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval [2.9801426627439453]
This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP.<n>Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data.<n>By addressing these limitations, we can develop more reliable cross-domain retrieval models for medical applications.
arXiv Detail & Related papers (2025-01-15T20:37:04Z)
Mask of truth: model sensitivity to unexpected regions of medical images [0.9896218845636701]
We challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images.<n>We show that all models trained on the Pad dataset, irrespective of the masking strategy, are able to obtain an Area Under the Curve (AUC) above random.<n>We also reveal a possible spurious correlation in the Chaksu dataset while the performances are more aligned with the expectation of an unbiased model.
arXiv Detail & Related papers (2024-12-05T10:06:58Z)
Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods [5.274804664403783]
We use Slice Discovery Methods to identify interpretable underperforming subsets of data and hypotheses regarding the cause of observed performance disparities. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients.
arXiv Detail & Related papers (2024-06-17T23:08:46Z)
How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z)
A knee cannot have lung disease: out-of-distribution detection with in-distribution voting using the medical example of chest X-ray classification [58.720142291102135]
The study employed the commonly used chest X-ray classification model, CheXnet, trained on the chest X-ray 14 data set. To detect OOD data for multi-label classification, we proposed in-distribution voting (IDV) The proposed IDV approach trained on ID (chest X-ray 14) and OOD data (IRMA and ImageNet) achieved, on average, 0.999 OOD detection AUC across the three data sets.
arXiv Detail & Related papers (2022-08-01T18:20:36Z)
Improving Classification Model Performance on Chest X-Rays through Lung Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations. Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z)
The pitfalls of using open data to develop deep learning solutions for COVID-19 detection in chest X-rays [64.02097860085202]
Deep learning models have been developed to identify COVID-19 from chest X-rays. Results have been exceptional when training and testing on open-source data. Data analysis and model evaluations show that the popular open-source dataset COVIDx is not representative of the real clinical problem.
arXiv Detail & Related papers (2021-09-14T10:59:11Z)
CheXbreak: Misclassification Identification for Deep Learning Models Interpreting Chest X-rays [5.263502842508203]
We first investigate whether there are patient subgroups that chest x-ray models are likely to misclassify. Patient age and the radiographic finding of lung lesion or pneumothorax are statistically relevant features for predicting misclassification for some chest x-ray models. We develop misclassification predictors on chest x-ray models using their outputs and clinical features.
arXiv Detail & Related papers (2021-03-18T00:30:19Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures. We built four training datasets combining images from public chest x-ray datasets and our institutional archive. We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z)
Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories. Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.