A Systematic Study on Quantifying Bias in GAN-Augmented Data
- URL: http://arxiv.org/abs/2308.13554v1
- Date: Wed, 23 Aug 2023 22:19:48 GMT
- Title: A Systematic Study on Quantifying Bias in GAN-Augmented Data
- Authors: Denis Liu
- Abstract summary: Generative adversarial networks (GANs) have recently become a popular data augmentation technique used by machine learning practitioners.
They have been shown to suffer from the so-called mode collapse failure mode, which makes them vulnerable to exacerbating biases on already skewed datasets.
This study is a systematic effort focused on the evaluation of state-of-the-art metrics that can potentially quantify biases in GAN-augmented data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative adversarial networks (GANs) have recently become a popular data
augmentation technique used by machine learning practitioners. However, they
have been shown to suffer from the so-called mode collapse failure mode, which
makes them vulnerable to exacerbating biases on already skewed datasets,
resulting in the generated data distribution being less diverse than the
training distribution. To this end, we address the problem of quantifying the
extent to which mode collapse occurs. This study is a systematic effort focused
on the evaluation of state-of-the-art metrics that can potentially quantify
biases in GAN-augmented data. We show that, while several such methods are
available, there is no single metric that quantifies bias exacerbation reliably
over the span of different image domains.
Related papers
- Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss [0.4249842620609682]
We build on the textitinvariant statistical loss (ISL) method introduced in citede2024training.
We extend it to handle heavy-tailed and multivariate data distributions.
We assess its performance in generative generative modeling and explore its potential as a pretraining technique for generative adversarial networks (GANs)
arXiv Detail & Related papers (2024-10-29T10:27:50Z) - Tackling the Problem of Distributional Shifts: Correcting Misspecified, High-Dimensional Data-Driven Priors for Inverse Problems [39.58317527488534]
Data-driven population-level distributions are emerging as an appealing alternative to simple parametric priors in inverse problems.
It is difficult to acquire independent and identically distributed samples from the underlying data-generating process of interest to train these models.
We show that starting from a misspecified prior distribution, the updated distribution becomes progressively closer to the underlying population-level distribution.
arXiv Detail & Related papers (2024-07-24T22:39:27Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - On Counterfactual Data Augmentation Under Confounding [30.76982059341284]
Counterfactual data augmentation has emerged as a method to mitigate confounding biases in the training data.
These biases arise due to various observed and unobserved confounding variables in the data generation process.
We show how our simple augmentation method helps existing state-of-the-art methods achieve good results.
arXiv Detail & Related papers (2023-05-29T16:20:23Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications.
To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently.
Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes.
Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements.
A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z) - Don't Discard All the Biased Instances: Investigating a Core Assumption
in Dataset Bias Mitigation Techniques [19.252319300590656]
Existing techniques for mitigating dataset bias often leverage a biased model to identify biased instances.
The role of these biased instances is then reduced during the training of the main model to enhance its robustness to out-of-distribution data.
In this paper, we show that this assumption does not hold in general.
arXiv Detail & Related papers (2021-09-01T10:25:46Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.