Are Sex-based Physiological Differences the Cause of Gender Bias for
Chest X-ray Diagnosis?
- URL: http://arxiv.org/abs/2308.05129v1
- Date: Wed, 9 Aug 2023 10:19:51 GMT
- Title: Are Sex-based Physiological Differences the Cause of Gender Bias for
Chest X-ray Diagnosis?
- Authors: Nina Weng, Siavash Bigdeli, Eike Petersen, Aasa Feragen
- Abstract summary: We investigate the causes of gender bias in machine learning-based chest X-ray diagnosis.
In particular, we explore the hypothesis that breast tissue leads to underexposure of the lungs.
We propose a new sampling method which addresses the highly skewed distribution of recordings per patient in two widely used public datasets.
- Score: 2.1601966913620325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While many studies have assessed the fairness of AI algorithms in the medical
field, the causes of differences in prediction performance are often unknown.
This lack of knowledge about the causes of bias hampers the efficacy of bias
mitigation, as evidenced by the fact that simple dataset balancing still often
performs best in reducing performance gaps but is unable to resolve all
performance differences. In this work, we investigate the causes of gender bias
in machine learning-based chest X-ray diagnosis. In particular, we explore the
hypothesis that breast tissue leads to underexposure of the lungs and causes
lower model performance. Methodologically, we propose a new sampling method
which addresses the highly skewed distribution of recordings per patient in two
widely used public datasets, while at the same time reducing the impact of
label errors. Our comprehensive analysis of gender differences across diseases,
datasets, and gender representations in the training set shows that dataset
imbalance is not the sole cause of performance differences. Moreover, relative
group performance differs strongly between datasets, indicating important
dataset-specific factors influencing male/female group performance. Finally, we
investigate the effect of breast tissue more specifically, by cropping out the
breasts from recordings, finding that this does not resolve the observed
performance gaps. In conclusion, our results indicate that dataset-specific
factors, not fundamental physiological differences, are the main drivers of
male--female performance gaps in chest X-ray analyses on widely used NIH and
CheXpert Dataset.
Related papers
- Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods [5.274804664403783]
We use Slice Discovery Methods to identify interpretable underperforming subsets of data and hypotheses regarding the cause of observed performance disparities.
Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients.
arXiv Detail & Related papers (2024-06-17T23:08:46Z) - (Predictable) Performance Bias in Unsupervised Anomaly Detection [3.826262429926079]
Unsupervised anomaly detection (UAD) models promise to aid in the crucial first step of disease detection.
Our study quantified the disparate performance of UAD models against certain demographic subgroups.
arXiv Detail & Related papers (2023-09-25T14:57:43Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - How Does Pruning Impact Long-Tailed Multi-Label Medical Image
Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance.
This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z) - Risk of Bias in Chest Radiography Deep Learning Foundation Models [14.962566915809264]
This study used 127,118 chest radiographs from 42,884 patients (mean age, 63 [SD] 17 years; 23,623 male, 19,261 female) from the CheXpert dataset collected between October 2002 and July 2017.
Ten out of twelve pairwise comparisons across biological sex and race showed statistically significant differences in the studied foundation model.
Significant differences were found between male and female (P .001) and Asian and Black patients (P .001) in the feature projections that primarily capture disease.
arXiv Detail & Related papers (2022-09-07T07:16:30Z) - To Impute or not to Impute? -- Missing Data in Treatment Effect
Estimation [84.76186111434818]
We identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection.
We show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates.
Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not.
arXiv Detail & Related papers (2022-02-04T12:08:31Z) - Potential sources of dataset bias complicate investigation of
underdiagnosis by machine learning algorithms [20.50071537200745]
Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates.
The study concludes that the models exhibit and potentially even amplify systematic underdiagnosis.
arXiv Detail & Related papers (2022-01-19T20:51:38Z) - Explaining medical AI performance disparities across sites with
confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities.
Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference.
We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.