Statistical Inference for Fairness Auditing
- URL: http://arxiv.org/abs/2305.03712v2
- Date: Thu, 8 Jun 2023 05:51:40 GMT
- Title: Statistical Inference for Fairness Auditing
- Authors: John J. Cherian, Emmanuel J. Cand\`es
- Abstract summary: We frame this task as "fairness auditing," in terms of multiple hypothesis testing.
We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups.
Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately.
- Score: 4.318555434063274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Before deploying a black-box model in high-stakes problems, it is important
to evaluate the model's performance on sensitive subpopulations. For example,
in a recidivism prediction task, we may wish to identify demographic groups for
which our prediction model has unacceptably high false positive rates or
certify that no such groups exist. In this paper, we frame this task, often
referred to as "fairness auditing," in terms of multiple hypothesis testing. We
show how the bootstrap can be used to simultaneously bound performance
disparities over a collection of groups with statistical guarantees. Our
methods can be used to flag subpopulations affected by model underperformance,
and certify subpopulations for which the model performs adequately. Crucially,
our audit is model-agnostic and applicable to nearly any performance metric or
group fairness criterion. Our methods also accommodate extremely rich -- even
infinite -- collections of subpopulations. Further, we generalize beyond
subpopulations by showing how to assess performance over certain distribution
shifts. We test the proposed methods on benchmark datasets in predictive
inference and algorithmic fairness and find that our audits can provide
interpretable and trustworthy guarantees.
Related papers
- Trustworthy Classification through Rank-Based Conformal Prediction Sets [9.559062601251464]
We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models.
Our approach constructs prediction sets that achieve the desired coverage rate while managing their size.
Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation.
arXiv Detail & Related papers (2024-07-05T10:43:41Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - A Brief Tutorial on Sample Size Calculations for Fairness Audits [6.66743248310448]
This tutorial provides guidance on how to determine the required subgroup sample sizes for a fairness audit.
Our findings are applicable to audits of binary classification models and multiple fairness metrics derived as summaries of the confusion matrix.
arXiv Detail & Related papers (2023-12-07T22:59:12Z) - Consistent Range Approximation for Fair Predictive Modeling [10.613912061919775]
The framework builds predictive models that are certifiably fair on the target population, regardless of the availability of external data during training.
The framework's efficacy is demonstrated through evaluations on real data, showing substantial improvement over existing state-of-the-art methods.
arXiv Detail & Related papers (2022-12-21T08:27:49Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Testing Group Fairness via Optimal Transport Projections [12.972104025246091]
The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are to the perturbation or due to the randomness in the data.
The statistical challenges, which may arise from multiple impact criteria that define group fairness, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models.
The proposed framework can also be used to test for testing composite intrinsic fairness hypotheses and fairness with multiple sensitive attributes.
arXiv Detail & Related papers (2021-06-02T10:51:39Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Achieving Equalized Odds by Resampling Sensitive Attributes [13.114114427206678]
We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness.
This differentiable functional is used as a penalty driving the model parameters towards equalized odds.
We develop a formal hypothesis test to detect whether a prediction rule violates this property, the first such test in the literature.
arXiv Detail & Related papers (2020-06-08T00:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.