On Biases in a UK Biobank-based Retinal Image Classification Model
- URL: http://arxiv.org/abs/2408.02676v2
- Date: Fri, 25 Oct 2024 16:51:19 GMT
- Title: On Biases in a UK Biobank-based Retinal Image Classification Model
- Authors: Anissa Alloula, Rima Mustafa, Daniel R McGowan, Bartłomiej W. Papież,
- Abstract summary: We explore whether disparities are present in the UK Biobank fundus retinal images by training and evaluating a disease classification model on these images.
We find substantial differences despite strong overall performance of the model.
We find that these methods are largely unable to enhance fairness, highlighting the need for better bias mitigation methods tailored to the specific type of bias.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has uncovered alarming disparities in the performance of machine learning models in healthcare. In this study, we explore whether such disparities are present in the UK Biobank fundus retinal images by training and evaluating a disease classification model on these images. We assess possible disparities across various population groups and find substantial differences despite strong overall performance of the model. In particular, we discover unfair performance for certain assessment centres, which is surprising given the rigorous data standardisation protocol. We compare how these differences emerge and apply a range of existing bias mitigation methods to each one. A key insight is that each disparity has unique properties and responds differently to the mitigation methods. We also find that these methods are largely unable to enhance fairness, highlighting the need for better bias mitigation methods tailored to the specific type of bias.
Related papers
- Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline [42.49727243388804]
We propose an in-the-wild multimodal plant disease recognition dataset.
It contains the largest number of disease classes but also text-based descriptions for each disease.
Our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world.
arXiv Detail & Related papers (2024-08-06T11:49:13Z) - A Large-Scale Empirical Study on Improving the Fairness of Image Classification Models [22.522156479335706]
This paper conducts the first large-scale empirical study to compare the performance of existing state-of-the-art fairness improving techniques.
Our findings reveal substantial variations in the performance of each method across different datasets and sensitive attributes.
Different fairness evaluation metrics, due to their distinct focuses, yield significantly different assessment results.
arXiv Detail & Related papers (2024-01-08T06:53:33Z) - The Role of Subgroup Separability in Group-Fair Medical Image
Classification [18.29079361470428]
We find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis.
Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
arXiv Detail & Related papers (2023-07-06T06:06:47Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - MEDFAIR: Benchmarking Fairness for Medical Imaging [44.73351338165214]
MEDFAIR is a framework to benchmark the fairness of machine learning models for medical imaging.
We find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes.
We make recommendations for different medical application scenarios that require different ethical principles.
arXiv Detail & Related papers (2022-10-04T16:30:47Z) - Learning Discriminative Representation via Metric Learning for
Imbalanced Medical Image Classification [52.94051907952536]
We propose embedding metric learning into the first stage of the two-stage framework specially to help the feature extractor learn to extract more discriminative feature representations.
Experiments mainly on three medical image datasets show that the proposed approach consistently outperforms existing onestage and two-stage approaches.
arXiv Detail & Related papers (2022-07-14T14:57:01Z) - Explaining medical AI performance disparities across sites with
confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities.
Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference.
We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.