(Predictable) Performance Bias in Unsupervised Anomaly Detection
- URL: http://arxiv.org/abs/2309.14198v1
- Date: Mon, 25 Sep 2023 14:57:43 GMT
- Title: (Predictable) Performance Bias in Unsupervised Anomaly Detection
- Authors: Felix Meissen, Svenja Breuer, Moritz Knolle, Alena Buyx, Ruth
M\"uller, Georgios Kaissis, Benedikt Wiestler, Daniel R\"uckert
- Abstract summary: Unsupervised anomaly detection (UAD) models promise to aid in the crucial first step of disease detection.
Our study quantified the disparate performance of UAD models against certain demographic subgroups.
- Score: 3.826262429926079
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Background: With the ever-increasing amount of medical imaging data, the
demand for algorithms to assist clinicians has amplified. Unsupervised anomaly
detection (UAD) models promise to aid in the crucial first step of disease
detection. While previous studies have thoroughly explored fairness in
supervised models in healthcare, for UAD, this has so far been unexplored.
Methods: In this study, we evaluated how dataset composition regarding
subgroups manifests in disparate performance of UAD models along multiple
protected variables on three large-scale publicly available chest X-ray
datasets. Our experiments were validated using two state-of-the-art UAD models
for medical images. Finally, we introduced a novel subgroup-AUROC (sAUROC)
metric, which aids in quantifying fairness in machine learning.
Findings: Our experiments revealed empirical "fairness laws" (similar to
"scaling laws" for Transformers) for training-dataset composition: Linear
relationships between anomaly detection performance within a subpopulation and
its representation in the training data. Our study further revealed performance
disparities, even in the case of balanced training data, and compound effects
that exacerbate the drop in performance for subjects associated with multiple
adversely affected groups.
Interpretation: Our study quantified the disparate performance of UAD models
against certain demographic subgroups. Importantly, we showed that this
unfairness cannot be mitigated by balanced representation alone. Instead, the
representation of some subgroups seems harder to learn by UAD models than that
of others. The empirical fairness laws discovered in our study make disparate
performance in UAD models easier to estimate and aid in determining the most
desirable dataset composition.
Related papers
- Intuitionistic Fuzzy Universum Twin Support Vector Machine for Imbalanced Data [0.0]
One of the major difficulties in machine learning methods is categorizing datasets that are imbalanced.
We propose intuitionistic fuzzy universum twin support vector machines for imbalanced data (IFUTSVM-ID)
We use an intuitionistic fuzzy membership scheme to mitigate the impact of noise and outliers.
arXiv Detail & Related papers (2024-10-27T04:25:42Z) - UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework [59.428668614618914]
We take a deeper look into the diverse factors that influence the efficacy of modern unsupervised domain adaptation (UDA) methods.
To facilitate our analysis, we first develop UDA-Bench, a novel PyTorch framework that standardizes training and evaluation for domain adaptation.
arXiv Detail & Related papers (2024-09-23T17:57:07Z) - Does Data-Efficient Generalization Exacerbate Bias in Foundation Models? [2.298227866545911]
Foundation models have emerged as robust models with label efficiency in diverse domains.
It is unclear whether using a large amount of unlabeled data, biased by the presence of sensitive attributes during pre-training, influences the fairness of the model.
This research examines the bias in the Foundation model when it is applied to fine-tune the Brazilian Multilabel Ophthalmological dataset.
arXiv Detail & Related papers (2024-08-28T22:14:44Z) - Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods [5.274804664403783]
We use Slice Discovery Methods to identify interpretable underperforming subsets of data and hypotheses regarding the cause of observed performance disparities.
Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients.
arXiv Detail & Related papers (2024-06-17T23:08:46Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Mitigating Health Disparities in EHR via Deconfounder [5.511343163506091]
We propose a novel framework, Parity Medical Deconfounder (PriMeD), to deal with the disparity issue in healthcare datasets.
PriMeD adopts a Conditional Variational Autoencoder (CVAE) to learn latent factors (substitute confounders) for observational data.
arXiv Detail & Related papers (2022-10-28T05:16:50Z) - Potential sources of dataset bias complicate investigation of
underdiagnosis by machine learning algorithms [20.50071537200745]
Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates.
The study concludes that the models exhibit and potentially even amplify systematic underdiagnosis.
arXiv Detail & Related papers (2022-01-19T20:51:38Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.