Risk of Bias in Chest Radiography Deep Learning Foundation Models
- URL: http://arxiv.org/abs/2209.02965v3
- Date: Fri, 29 Sep 2023 21:18:19 GMT
- Title: Risk of Bias in Chest Radiography Deep Learning Foundation Models
- Authors: Ben Glocker, Charles Jones, Melanie Roschewitz, Stefan Winzeck
- Abstract summary: This study used 127,118 chest radiographs from 42,884 patients (mean age, 63 [SD] 17 years; 23,623 male, 19,261 female) from the CheXpert dataset collected between October 2002 and July 2017.
Ten out of twelve pairwise comparisons across biological sex and race showed statistically significant differences in the studied foundation model.
Significant differences were found between male and female (P .001) and Asian and Black patients (P .001) in the feature projections that primarily capture disease.
- Score: 14.962566915809264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: To analyze a recently published chest radiography foundation model
for the presence of biases that could lead to subgroup performance disparities
across biological sex and race.
Materials and Methods: This retrospective study used 127,118 chest
radiographs from 42,884 patients (mean age, 63 [SD] 17 years; 23,623 male,
19,261 female) from the CheXpert dataset collected between October 2002 and
July 2017. To determine the presence of bias in features generated by a chest
radiography foundation model and baseline deep learning model, dimensionality
reduction methods together with two-sample Kolmogorov-Smirnov tests were used
to detect distribution shifts across sex and race. A comprehensive disease
detection performance analysis was then performed to associate any biases in
the features to specific disparities in classification performance across
patient subgroups.
Results: Ten out of twelve pairwise comparisons across biological sex and
race showed statistically significant differences in the studied foundation
model, compared with four significant tests in the baseline model. Significant
differences were found between male and female (P < .001) and Asian and Black
patients (P < .001) in the feature projections that primarily capture disease.
Compared with average model performance across all subgroups, classification
performance on the 'no finding' label dropped between 6.8% and 7.8% for female
patients, and performance in detecting 'pleural effusion' dropped between 10.7%
and 11.6% for Black patients.
Conclusion: The studied chest radiography foundation model demonstrated
racial and sex-related bias leading to disparate performance across patient
subgroups and may be unsafe for clinical applications.
Related papers
- Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods [5.274804664403783]
We use Slice Discovery Methods to identify interpretable underperforming subsets of data and hypotheses regarding the cause of observed performance disparities.
Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients.
arXiv Detail & Related papers (2024-06-17T23:08:46Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Analysing race and sex bias in brain age prediction [18.68533487971233]
We analyse the commonly used ResNet-34 model by conducting a subgroup performance analysis and feature inspection.
Our results reveal statistically significant differences in predictive performance between Black and White, Black and Asian, and male and female subjects.
arXiv Detail & Related papers (2023-09-19T14:40:19Z) - Multivariate Analysis on Performance Gaps of Artificial Intelligence
Models in Screening Mammography [4.123006816939975]
Deep learning models for abnormality classification can perform well in screening mammography.
The demographic, imaging, and clinical characteristics associated with increased risk of model failure remain unclear.
We assessed model performance by subgroups defined by age, race, pathologic outcome, tissue density, and imaging characteristics.
arXiv Detail & Related papers (2023-05-08T02:28:45Z) - TotalSegmentator: robust segmentation of 104 anatomical structures in CT
images [48.50994220135258]
We present a deep learning segmentation model for body CT images.
The model can segment 104 anatomical structures relevant for use cases such as organ volumetry, disease characterization, and surgical or radiotherapy planning.
arXiv Detail & Related papers (2022-08-11T15:16:40Z) - Generalizable and Robust Deep Learning Algorithm for Atrial Fibrillation
Diagnosis Across Ethnicities, Ages and Sexes [0.0]
This study is the first to develop and assess the generalization performance of a deep learning (DL) model for AF events detection.
The model, ArNet2, was developed on a large retrospective dataset of 2,147 patients totaling 51,386 hours of continuous electrocardiogram (ECG)
It was validated on a retrospective dataset of 1,730 consecutives Holter recordings from the Rambam Hospital Holter clinic, Haifa, Israel.
arXiv Detail & Related papers (2022-07-20T05:49:16Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Personalized pathology test for Cardio-vascular disease: Approximate
Bayesian computation with discriminative summary statistics learning [48.7576911714538]
We propose a platelet deposition model and an inferential scheme to estimate the biologically meaningful parameters using approximate computation.
This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.
arXiv Detail & Related papers (2020-10-13T15:20:21Z) - Investigating Bias in Deep Face Analysis: The KANFace Dataset and
Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date.
The data are manually annotated in terms of identity, exact age, gender and kinship.
A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.