A Statistical Test for Probabilistic Fairness
- URL: http://arxiv.org/abs/2012.04800v1
- Date: Wed, 9 Dec 2020 00:20:02 GMT
- Title: A Statistical Test for Probabilistic Fairness
- Authors: Bahar Taskesen, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen
- Abstract summary: We propose a statistical hypothesis test for detecting unfair classifiers.
We show both theoretically as well as empirically that the proposed test is correct.
In addition, the proposed framework offers interpretability by identifying the most favorable perturbation of the data.
- Score: 11.95891442664266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algorithms are now routinely used to make consequential decisions that affect
human lives. Examples include college admissions, medical interventions or law
enforcement. While algorithms empower us to harness all information hidden in
vast amounts of data, they may inadvertently amplify existing biases in the
available datasets. This concern has sparked increasing interest in fair
machine learning, which aims to quantify and mitigate algorithmic
discrimination. Indeed, machine learning models should undergo intensive tests
to detect algorithmic biases before being deployed at scale. In this paper, we
use ideas from the theory of optimal transport to propose a statistical
hypothesis test for detecting unfair classifiers. Leveraging the geometry of
the feature space, the test statistic quantifies the distance of the empirical
distribution supported on the test samples to the manifold of distributions
that render a pre-trained classifier fair. We develop a rigorous hypothesis
testing mechanism for assessing the probabilistic fairness of any pre-trained
logistic classifier, and we show both theoretically as well as empirically that
the proposed test is asymptotically correct. In addition, the proposed
framework offers interpretability by identifying the most favorable
perturbation of the data so that the given classifier becomes fair.
Related papers
- Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.
We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z) - Provable Detection of Propagating Sampling Bias in Prediction Models [1.7709344190822935]
We provide a theoretical analysis of how a specific form of data bias, differential sampling bias, propagates from the data stage to the prediction stage.
Under reasonable assumptions, we quantify how the amount of bias in the model predictions varies as a function of the amount of differential sampling bias in the data.
We demonstrate that the theoretical results hold in practice even when our assumptions are relaxed.
arXiv Detail & Related papers (2023-02-13T23:39:35Z) - Bounding Counterfactuals under Selection Bias [60.55840896782637]
We propose a first algorithm to address both identifiable and unidentifiable queries.
We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal.
arXiv Detail & Related papers (2022-07-26T10:33:10Z) - A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms [19.86635585740634]
We present the conceptual idea and a first implementation of a bias-injection sandbox tool to investigate fairness consequences of various biases.
Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline.
In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention with true labels in the unbiased regime-that is, before any bias injection.
arXiv Detail & Related papers (2022-04-21T16:12:19Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - Testing Group Fairness via Optimal Transport Projections [12.972104025246091]
The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are to the perturbation or due to the randomness in the data.
The statistical challenges, which may arise from multiple impact criteria that define group fairness, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models.
The proposed framework can also be used to test for testing composite intrinsic fairness hypotheses and fairness with multiple sensitive attributes.
arXiv Detail & Related papers (2021-06-02T10:51:39Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.