A Brief Tutorial on Sample Size Calculations for Fairness Audits
- URL: http://arxiv.org/abs/2312.04745v1
- Date: Thu, 7 Dec 2023 22:59:12 GMT
- Title: A Brief Tutorial on Sample Size Calculations for Fairness Audits
- Authors: Harvineet Singh, Fan Xia, Mi-Ok Kim, Romain Pirracchio, Rumi Chunara,
Jean Feng
- Abstract summary: This tutorial provides guidance on how to determine the required subgroup sample sizes for a fairness audit.
Our findings are applicable to audits of binary classification models and multiple fairness metrics derived as summaries of the confusion matrix.
- Score: 6.66743248310448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In fairness audits, a standard objective is to detect whether a given
algorithm performs substantially differently between subgroups. Properly
powering the statistical analysis of such audits is crucial for obtaining
informative fairness assessments, as it ensures a high probability of detecting
unfairness when it exists. However, limited guidance is available on the amount
of data necessary for a fairness audit, lacking directly applicable results
concerning commonly used fairness metrics. Additionally, the consideration of
unequal subgroup sample sizes is also missing. In this tutorial, we address
these issues by providing guidance on how to determine the required subgroup
sample sizes to maximize the statistical power of hypothesis tests for
detecting unfairness. Our findings are applicable to audits of binary
classification models and multiple fairness metrics derived as summaries of the
confusion matrix. Furthermore, we discuss other aspects of audit study designs
that can increase the reliability of audit results.
Related papers
- Sampling Audit Evidence Using a Naive Bayes Classifier [0.0]
This study advances sampling techniques by integrating machine learning with sampling.
Machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples.
arXiv Detail & Related papers (2024-03-21T01:35:03Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - A Universal Unbiased Method for Classification from Aggregate
Observations [115.20235020903992]
This paper presents a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses.
Our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses.
arXiv Detail & Related papers (2023-06-20T07:22:01Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Statistical Inference for Fairness Auditing [4.318555434063274]
We frame this task as "fairness auditing," in terms of multiple hypothesis testing.
We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups.
Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately.
arXiv Detail & Related papers (2023-05-05T17:54:22Z) - Error Parity Fairness: Testing for Group Fairness in Regression Tasks [5.076419064097733]
This work presents error parity as a regression fairness notion and introduces a testing methodology to assess group fairness.
It is followed by a suitable permutation test to compare groups on several statistics to explore disparities and identify impacted groups.
Overall, the proposed regression fairness testing methodology fills a gap in the fair machine learning literature and may serve as a part of larger accountability assessments and algorithm audits.
arXiv Detail & Related papers (2022-08-16T17:47:20Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Testing Group Fairness via Optimal Transport Projections [12.972104025246091]
The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are to the perturbation or due to the randomness in the data.
The statistical challenges, which may arise from multiple impact criteria that define group fairness, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models.
The proposed framework can also be used to test for testing composite intrinsic fairness hypotheses and fairness with multiple sensitive attributes.
arXiv Detail & Related papers (2021-06-02T10:51:39Z) - A Statistical Test for Probabilistic Fairness [11.95891442664266]
We propose a statistical hypothesis test for detecting unfair classifiers.
We show both theoretically as well as empirically that the proposed test is correct.
In addition, the proposed framework offers interpretability by identifying the most favorable perturbation of the data.
arXiv Detail & Related papers (2020-12-09T00:20:02Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.