How Far Can Fairness Constraints Help Recover From Biased Data?
- URL: http://arxiv.org/abs/2312.10396v4
- Date: Sat, 1 Jun 2024 12:50:44 GMT
- Title: How Far Can Fairness Constraints Help Recover From Biased Data?
- Authors: Mohit Sharma, Amit Deshpande,
- Abstract summary: A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen.
Contrary to this belief, Blum & Stangl show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution.
- Score: 9.430687114814997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen. Contrary to this belief, Blum & Stangl (2019) show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution. Their result is interesting because it demonstrates that fairness constraints can implicitly rectify data bias and simultaneously overcome a perceived fairness-accuracy trade-off. Their data bias model simulates under-representation and label bias in underprivileged population, and they show the above result on a stylized data distribution with i.i.d. label noise, under simple conditions on the data distribution and bias parameters. We propose a general approach to extend the result of Blum & Stangl (2019) to different fairness constraints, data bias models, data distributions, and hypothesis classes. We strengthen their result, and extend it to the case when their stylized distribution has labels with Massart noise instead of i.i.d. noise. We prove a similar recovery result for arbitrary data distributions using fair reject option classifiers. We further generalize it to arbitrary data distributions and arbitrary hypothesis classes, i.e., we prove that for any data distribution, if the optimally accurate classifier in a given hypothesis class is fair and robust, then it can be recovered through fair classification with equal opportunity constraints on the biased distribution whenever the bias parameters satisfy certain simple conditions. Finally, we show applications of our technique to time-varying data bias in classification and fair machine learning pipelines.
Related papers
- Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - On the Power of Randomization in Fair Classification and Representation [4.423655941492362]
We show the power of randomization to minimize the loss of accuracy that results when we impose fairness constraints.
We construct DP-fair, EO-fair, and PE-fair representations that have provably optimal accuracy and suffer no accuracy loss compared to the optimal DP-fair, EO-fair, and PE-fair classifiers respectively on the original data distribution.
arXiv Detail & Related papers (2024-06-05T10:55:11Z) - Distributionally Generative Augmentation for Fair Facial Attribute Classification [69.97710556164698]
Facial Attribute Classification (FAC) holds substantial promise in widespread applications.
FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations.
This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.
arXiv Detail & Related papers (2024-03-11T10:50:53Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - On Comparing Fair Classifiers under Data Bias [42.43344286660331]
We study the effect of varying data biases on the accuracy and fairness of fair classifiers.
Our experiments show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
arXiv Detail & Related papers (2023-02-12T13:04:46Z) - Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases.
We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data.
We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples.
Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z) - How Robust is Your Fairness? Evaluating and Sustaining Fairness under
Unseen Distribution Shifts [107.72786199113183]
We propose a novel fairness learning method termed CUrvature MAtching (CUMA)
CUMA achieves robust fairness generalizable to unseen domains with unknown distributional shifts.
We evaluate our method on three popular fairness datasets.
arXiv Detail & Related papers (2022-07-04T02:37:50Z) - DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative
Networks [71.6879432974126]
We introduce DECAF: a GAN-based fair synthetic data generator for tabular data.
We show that DECAF successfully removes undesired bias and is capable of generating high-quality synthetic data.
We provide theoretical guarantees on the generator's convergence and the fairness of downstream models.
arXiv Detail & Related papers (2021-10-25T12:39:56Z) - Bias-Tolerant Fair Classification [20.973916494320246]
label bias and selection bias are two reasons in data that will hinder the fairness of machine-learning outcomes.
We propose a Bias-TolerantFAirRegularizedLoss (B-FARL) which tries to regain the benefits using data affected by label bias and selection bias.
B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required.
arXiv Detail & Related papers (2021-07-07T13:31:38Z) - Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? [11.435833538081557]
Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution.
We examine the ability of fairness-constrained ERM to correct this problem.
We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity.
arXiv Detail & Related papers (2019-12-02T22:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.