Optimal Pre-Processing to Achieve Fairness and Its Relationship with
Total Variation Barycenter
- URL: http://arxiv.org/abs/2101.06811v1
- Date: Mon, 18 Jan 2021 00:20:32 GMT
- Title: Optimal Pre-Processing to Achieve Fairness and Its Relationship with
Total Variation Barycenter
- Authors: Farhad Farokhi
- Abstract summary: We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness.
We then use pre-processing, also known as data repair, to enforce fairness.
We show that utility degradation, i.e., the extent that the success of a forecasting model changes by pre-processing the data, is upper bounded by the total variation distance between the distribution of the data before and after pre-processing.
- Score: 14.497406777219112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We use disparate impact, i.e., the extent that the probability of observing
an output depends on protected attributes such as race and gender, to measure
fairness. We prove that disparate impact is upper bounded by the total
variation distance between the distribution of the inputs given the protected
attributes. We then use pre-processing, also known as data repair, to enforce
fairness. We show that utility degradation, i.e., the extent that the success
of a forecasting model changes by pre-processing the data, is upper bounded by
the total variation distance between the distribution of the data before and
after pre-processing. Hence, the problem of finding the optimal pre-processing
regiment for enforcing fairness can be cast as minimizing total variations
distance between the distribution of the data before and after pre-processing
subject to a constraint on the total variation distance between the
distribution of the inputs given protected attributes. This problem is a linear
program that can be efficiently solved. We show that this problem is intimately
related to finding the barycenter (i.e., center of mass) of two distributions
when distances in the probability space are measured by total variation
distance. We also investigate the effect of differential privacy on fairness
using the proposed the total variation distances. We demonstrate the results
using numerical experimentation with a practice dataset.
Related papers
- Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data [10.678533056953784]
We propose a discriminative approach to estimate the total variation (TV) distance between two distributions.
Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance.
We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved.
arXiv Detail & Related papers (2024-05-24T08:18:09Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Undersmoothing Causal Estimators with Generative Trees [0.0]
Inferring individualised treatment effects from observational data can unlock the potential for targeted interventions.
It is, however, hard to infer these effects from observational data.
In this paper, we explore a novel generative tree based approach that tackles model misspecification directly.
arXiv Detail & Related papers (2022-03-16T11:59:38Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Robust Bayesian Inference for Discrete Outcomes with the Total Variation
Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination.
Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD)
We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z) - Bias and Variance of Post-processing in Differential Privacy [53.29035917495491]
Post-processing immunity is a fundamental property of differential privacy.
It is often argued that post-processing may introduce bias and increase variance.
This paper takes a first step towards understanding the properties of post-processing.
arXiv Detail & Related papers (2020-10-09T02:12:54Z) - DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data.
Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information.
Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.