Towards unraveling calibration biases in medical image analysis
- URL: http://arxiv.org/abs/2305.05101v1
- Date: Tue, 9 May 2023 00:11:35 GMT
- Title: Towards unraveling calibration biases in medical image analysis
- Authors: Mar\'ia Agustina Ricci Lara, Candelaria Mosquera, Enzo Ferrante,
Rodrigo Echeveste
- Abstract summary: We show how several typically employed calibration metrics are systematically biased with respect to sample sizes.
This is of particular relevance to fairness studies, where data imbalance results in drastic sample size differences between demographic sub-groups.
- Score: 2.4054878434935074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years the development of artificial intelligence (AI) systems for
automated medical image analysis has gained enormous momentum. At the same
time, a large body of work has shown that AI systems can systematically and
unfairly discriminate against certain populations in various application
scenarios. These two facts have motivated the emergence of algorithmic fairness
studies in this field. Most research on healthcare algorithmic fairness to date
has focused on the assessment of biases in terms of classical discrimination
metrics such as AUC and accuracy. Potential biases in terms of model
calibration, however, have only recently begun to be evaluated. This is
especially important when working with clinical decision support systems, as
predictive uncertainty is key for health professionals to optimally evaluate
and combine multiple sources of information. In this work we study
discrimination and calibration biases in models trained for automatic detection
of malignant dermatological conditions from skin lesions images. Importantly,
we show how several typically employed calibration metrics are systematically
biased with respect to sample sizes, and how this can lead to erroneous
fairness analysis if not taken into consideration. This is of particular
relevance to fairness studies, where data imbalance results in drastic sample
size differences between demographic sub-groups, which, if not taken into
account, can act as confounders.
Related papers
- Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging [2.0890189482817165]
We introduce a novel analysis framework for investigating the impact of biases in medical images on AI models.
We developed and tested this framework for conducting controlled in silico trials to assess bias in medical imaging AI.
arXiv Detail & Related papers (2023-11-03T01:37:28Z) - Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context.
We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available.
These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z) - Mitigating Calibration Bias Without Fixed Attribute Grouping for
Improved Fairness in Medical Imaging Analysis [2.8943928153775826]
Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias.
We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients.
arXiv Detail & Related papers (2023-07-04T14:14:12Z) - MEDFAIR: Benchmarking Fairness for Medical Imaging [44.73351338165214]
MEDFAIR is a framework to benchmark the fairness of machine learning models for medical imaging.
We find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes.
We make recommendations for different medical application scenarios that require different ethical principles.
arXiv Detail & Related papers (2022-10-04T16:30:47Z) - Anatomizing Bias in Facial Analysis [86.79402670904338]
Existing facial analysis systems have been shown to yield biased results against certain demographic subgroups.
It has become imperative to ensure that these systems do not discriminate based on gender, identity, or skin tone of individuals.
This has led to research in the identification and mitigation of bias in AI systems.
arXiv Detail & Related papers (2021-12-13T09:51:13Z) - Explaining medical AI performance disparities across sites with
confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities.
Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference.
We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - Risk of Training Diagnostic Algorithms on Data with Demographic Bias [0.5599792629509227]
We conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications.
Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used.
We show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup.
arXiv Detail & Related papers (2020-05-20T13:51:01Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.