The (de)biasing effect of GAN-based augmentation methods on skin lesion
images
- URL: http://arxiv.org/abs/2206.15182v1
- Date: Thu, 30 Jun 2022 10:32:35 GMT
- Title: The (de)biasing effect of GAN-based augmentation methods on skin lesion
images
- Authors: Agnieszka Miko{\l}ajczyk, Sylwia Majchrowska, Sandra Carrasco Limeros
- Abstract summary: New medical datasets might still be a source of spurious correlations that affect the learning process.
One approach to alleviate the data imbalance is using data augmentation with Generative Adversarial Networks (GANs)
This work explored unconditional and conditional GANs to compare their bias inheritance and how the synthetic data influenced the models.
- Score: 3.441021278275805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: New medical datasets are now more open to the public, allowing for better and
more extensive research. Although prepared with the utmost care, new datasets
might still be a source of spurious correlations that affect the learning
process. Moreover, data collections are usually not large enough and are often
unbalanced. One approach to alleviate the data imbalance is using data
augmentation with Generative Adversarial Networks (GANs) to extend the dataset
with high-quality images. GANs are usually trained on the same biased datasets
as the target data, resulting in more biased instances. This work explored
unconditional and conditional GANs to compare their bias inheritance and how
the synthetic data influenced the models. We provided extensive manual data
annotation of possibly biasing artifacts on the well-known ISIC dataset with
skin lesions. In addition, we examined classification models trained on both
real and synthetic data with counterfactual bias explanations. Our experiments
showed that GANs inherited biases and sometimes even amplified them, leading to
even stronger spurious correlations. Manual data annotation and synthetic
images are publicly available for reproducible scientific research.
Related papers
- Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models? [29.71939692883025]
We investigate the effects of generated data on image classification tasks, with a specific focus on bias.
Hundreds of experiments are conducted on Colorized MNIST, CIFAR-20/100, and Hard ImageNet datasets.
Our findings contribute to the ongoing debate on the implications of synthetic data for fairness in real-world applications.
arXiv Detail & Related papers (2024-10-14T05:07:06Z) - Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training.
Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z) - DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection [9.801159950963306]
We propose DiffInject, a powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model.
Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing.
arXiv Detail & Related papers (2024-06-10T09:45:38Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Analyzing Bias in Diffusion-based Face Generation Models [75.80072686374564]
Diffusion models are increasingly popular in synthetic data generation and image editing applications.
We investigate the presence of bias in diffusion-based face generation models with respect to attributes such as gender, race, and age.
We examine how dataset size affects the attribute composition and perceptual quality of both diffusion and Generative Adversarial Network (GAN) based face generation models.
arXiv Detail & Related papers (2023-05-10T18:22:31Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Transitioning from Real to Synthetic data: Quantifying the bias in model [1.6134566438137665]
This study aims to establish a trade-off between bias and fairness in the models trained using synthetic data.
We demonstrate there exist a varying levels of bias impact on models trained using synthetic data.
arXiv Detail & Related papers (2021-05-10T06:57:14Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Detect and Correct Bias in Multi-Site Neuroimaging Datasets [2.750124853532831]
We combine 35,320 magnetic resonance images of the brain from 17 studies to examine bias in neuroimaging.
We take a closer look at confounding bias, which is often viewed as the main shortcoming in observational studies.
We propose an extension of the recently introduced ComBat algorithm to control for global variation across image features.
arXiv Detail & Related papers (2020-02-12T15:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.