DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative
Networks
- URL: http://arxiv.org/abs/2110.12884v1
- Date: Mon, 25 Oct 2021 12:39:56 GMT
- Title: DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative
Networks
- Authors: Boris van Breugel, Trent Kyono, Jeroen Berrevoets, Mihaela van der
Schaar
- Abstract summary: We introduce DECAF: a GAN-based fair synthetic data generator for tabular data.
We show that DECAF successfully removes undesired bias and is capable of generating high-quality synthetic data.
We provide theoretical guarantees on the generator's convergence and the fairness of downstream models.
- Score: 71.6879432974126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models have been criticized for reflecting unfair biases in
the training data. Instead of solving for this by introducing fair learning
algorithms directly, we focus on generating fair synthetic data, such that any
downstream learner is fair. Generating fair synthetic data from unfair data -
while remaining truthful to the underlying data-generating process (DGP) - is
non-trivial. In this paper, we introduce DECAF: a GAN-based fair synthetic data
generator for tabular data. With DECAF we embed the DGP explicitly as a
structural causal model in the input layers of the generator, allowing each
variable to be reconstructed conditioned on its causal parents. This procedure
enables inference time debiasing, where biased edges can be strategically
removed for satisfying user-defined fairness requirements. The DECAF framework
is versatile and compatible with several popular definitions of fairness. In
our experiments, we show that DECAF successfully removes undesired bias and -
in contrast to existing methods - is capable of generating high-quality
synthetic data. Furthermore, we provide theoretical guarantees on the
generator's convergence and the fairness of downstream models.
Related papers
- Data Augmentation via Diffusion Model to Enhance AI Fairness [1.2979015577834876]
This paper explores the potential of diffusion models to generate synthetic data to improve AI fairness.
The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM) was utilized with different amounts of generated data for data augmentation.
Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.
arXiv Detail & Related papers (2024-10-20T18:52:31Z) - Generating Synthetic Fair Syntax-agnostic Data by Learning and Distilling Fair Representation [4.1942958779358674]
Existing bias-mitigating generative methods need in-processing fairness objectives and fail to consider computational overhead.
We present a fair data generation technique based on knowledge distillation, where we use a small architecture to distill the fair representation in the latent space.
Our approaches show a 5%, 5% and 10% rise in performance in fairness, synthetic sample quality and data utility, respectively, than the state-of-the-art fair generative model.
arXiv Detail & Related papers (2024-08-20T11:37:52Z) - How Far Can Fairness Constraints Help Recover From Biased Data? [9.430687114814997]
A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen.
Contrary to this belief, Blum & Stangl show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution.
arXiv Detail & Related papers (2023-12-16T09:49:31Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - Investigating Bias with a Synthetic Data Generator: Empirical Evidence
and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society.
Risk is that they will systematically spread the bias embedded in data.
We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - TabFairGAN: Fair Tabular Data Generation with Generative Adversarial
Networks [0.0]
We propose a Generative Adversarial Network for tabular data generation.
We test our results in both cases of unconstrained, and constrained fair data generation.
Our model is comparably more stable by using only one critic, and also by avoiding major problems of original GAN model.
arXiv Detail & Related papers (2021-09-02T01:48:01Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.