Risk of Training Diagnostic Algorithms on Data with Demographic Bias
- URL: http://arxiv.org/abs/2005.10050v2
- Date: Wed, 17 Jun 2020 11:33:59 GMT
- Title: Risk of Training Diagnostic Algorithms on Data with Demographic Bias
- Authors: Samaneh Abbasi-Sureshjani, Ralf Raumanns, Britt E. J. Michels, Gerard
Schouten, Veronika Cheplygina
- Abstract summary: We conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications.
Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used.
We show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup.
- Score: 0.5599792629509227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the critical challenges in machine learning applications is to have
fair predictions. There are numerous recent examples in various domains that
convincingly show that algorithms trained with biased datasets can easily lead
to erroneous or discriminatory conclusions. This is even more crucial in
clinical applications where the predictive algorithms are designed mainly based
on a limited or given set of medical images and demographic variables such as
age, sex and race are not taken into account. In this work, we conduct a survey
of the MICCAI 2018 proceedings to investigate the common practice in medical
image analysis applications. Surprisingly, we found that papers focusing on
diagnosis rarely describe the demographics of the datasets used, and the
diagnosis is purely based on images. In order to highlight the importance of
considering the demographics in diagnosis tasks, we used a publicly available
dataset of skin lesions. We then demonstrate that a classifier with an overall
area under the curve (AUC) of 0.83 has variable performance between 0.76 and
0.91 on subgroups based on age and sex, even though the training set was
relatively balanced. Moreover, we show that it is possible to learn unbiased
features by explicitly using demographic variables in an adversarial training
setup, which leads to balanced scores per subgroups. Finally, we discuss the
implications of these results and provide recommendations for further research.
Related papers
- Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance.
We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies.
LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z) - Demographic Bias of Expert-Level Vision-Language Foundation Models in
Medical Imaging [13.141767097232796]
Self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations.
It is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically marginalized groups such as females or Black patients.
This study investigates the algorithmic fairness of state-of-the-art vision-language foundation models in chest X-ray diagnosis across five globally-sourced datasets.
arXiv Detail & Related papers (2024-02-22T18:59:53Z) - An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets [32.25265709333831]
We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
arXiv Detail & Related papers (2023-11-06T17:08:41Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience [21.420302408947194]
We develop a weighted empirical risk minimization approach that optimally combines data from a source group to make predictions on a target group.
We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of Alzheimer's disease and estimation of brain age.
arXiv Detail & Related papers (2023-08-06T18:05:39Z) - Towards unraveling calibration biases in medical image analysis [2.4054878434935074]
We show how several typically employed calibration metrics are systematically biased with respect to sample sizes.
This is of particular relevance to fairness studies, where data imbalance results in drastic sample size differences between demographic sub-groups.
arXiv Detail & Related papers (2023-05-09T00:11:35Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - IA-GCN: Interpretable Attention based Graph Convolutional Network for
Disease prediction [47.999621481852266]
We propose an interpretable graph learning-based model which interprets the clinical relevance of the input features towards the task.
In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning.
Our proposed model shows superior performance with respect to compared methods with an increase in an average accuracy of 3.2% for Tadpole, 1.6% for UKBB Gender, and 2% for the UKBB Age prediction task.
arXiv Detail & Related papers (2021-03-29T13:04:02Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.