Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods
- URL: http://arxiv.org/abs/2406.12142v2
- Date: Tue, 22 Oct 2024 13:32:34 GMT
- Title: Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods
- Authors: Vincent Olesen, Nina Weng, Aasa Feragen, Eike Petersen,
- Abstract summary: We use Slice Discovery Methods to identify interpretable underperforming subsets of data and hypotheses regarding the cause of observed performance disparities.
Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients.
- Score: 5.274804664403783
- License:
- Abstract: Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.
Related papers
- Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance.
We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies.
LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z) - Inspecting Model Fairness in Ultrasound Segmentation Tasks [20.281029492841878]
We inspect a series of deep learning (DL) segmentation models using two ultrasound datasets.
Our findings reveal that even state-of-the-art DL algorithms demonstrate unfair behavior in ultrasound segmentation tasks.
These results serve as a crucial warning, underscoring the necessity for careful model evaluation before their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-12-05T05:08:08Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - (Predictable) Performance Bias in Unsupervised Anomaly Detection [3.826262429926079]
Unsupervised anomaly detection (UAD) models promise to aid in the crucial first step of disease detection.
Our study quantified the disparate performance of UAD models against certain demographic subgroups.
arXiv Detail & Related papers (2023-09-25T14:57:43Z) - How Does Pruning Impact Long-Tailed Multi-Label Medical Image
Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance.
This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z) - Are Sex-based Physiological Differences the Cause of Gender Bias for
Chest X-ray Diagnosis? [2.1601966913620325]
We investigate the causes of gender bias in machine learning-based chest X-ray diagnosis.
In particular, we explore the hypothesis that breast tissue leads to underexposure of the lungs.
We propose a new sampling method which addresses the highly skewed distribution of recordings per patient in two widely used public datasets.
arXiv Detail & Related papers (2023-08-09T10:19:51Z) - Discrimination of Radiologists Utilizing Eye-Tracking Technology and
Machine Learning: A Case Study [0.9142067094647588]
This study presents a novel discretized feature encoding based on binning fixation data for efficient geometric alignment.
The encoded features of the eye-fixation data are employed by machine learning classifiers to discriminate between faculty and trainee radiologists.
arXiv Detail & Related papers (2023-08-04T23:51:47Z) - Explaining medical AI performance disparities across sites with
confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities.
Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference.
We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.