Related papers: Risk of Training Diagnostic Algorithms on Data with Demographic Bias

Risk of Training Diagnostic Algorithms on Data with Demographic Bias

URL: http://arxiv.org/abs/2005.10050v2
Date: Wed, 17 Jun 2020 11:33:59 GMT
Title: Risk of Training Diagnostic Algorithms on Data with Demographic Bias
Authors: Samaneh Abbasi-Sureshjani, Ralf Raumanns, Britt E. J. Michels, Gerard Schouten, Veronika Cheplygina
Abstract summary: We conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used. We show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup.
Score: 0.5599792629509227
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where the predictive algorithms are designed mainly based on a limited or given set of medical images and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.

Related papers

Debias-CLR: A Contrastive Learning Based Debiasing Method for Algorithmic Fairness in Healthcare Applications [0.17624347338410748]
We proposed an implicit in-processing debiasing method to combat disparate treatment. We used clinical notes of heart failure patients and used diagnostic codes, procedure reports and physiological vitals of the patients. We found that Debias-CLR was able to reduce the Single-Category Word Embedding Association Test (SC-WEAT) effect size score when debiasing for gender and ethnicity.
arXiv Detail & Related papers (2024-11-15T19:32:01Z)
Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance. We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies. LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z)
Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging [13.141767097232796]
Self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. It is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically marginalized groups such as females or Black patients. This study investigates the algorithmic fairness of state-of-the-art vision-language foundation models in chest X-ray diagnosis across five globally-sourced datasets.
arXiv Detail & Related papers (2024-02-22T18:59:53Z)
An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets [32.25265709333831]
We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity) We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
arXiv Detail & Related papers (2023-11-06T17:08:41Z)
Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data. The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z)
Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience [21.420302408947194]
We develop a weighted empirical risk minimization approach that optimally combines data from a source group to make predictions on a target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of Alzheimer's disease and estimation of brain age.
arXiv Detail & Related papers (2023-08-06T18:05:39Z)
Towards unraveling calibration biases in medical image analysis [2.4054878434935074]
We show how several typically employed calibration metrics are systematically biased with respect to sample sizes. This is of particular relevance to fairness studies, where data imbalance results in drastic sample size differences between demographic sub-groups.
arXiv Detail & Related papers (2023-05-09T00:11:35Z)
Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness. We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks. Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z)
IA-GCN: Interpretable Attention based Graph Convolutional Network for Disease prediction [47.999621481852266]
We propose an interpretable graph learning-based model which interprets the clinical relevance of the input features towards the task. In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning. Our proposed model shows superior performance with respect to compared methods with an increase in an average accuracy of 3.2% for Tadpole, 1.6% for UKBB Gender, and 2% for the UKBB Age prediction task.
arXiv Detail & Related papers (2021-03-29T13:04:02Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system. Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model. We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.