Auditing Fairness by Betting
- URL: http://arxiv.org/abs/2305.17570v2
- Date: Sun, 29 Oct 2023 17:40:24 GMT
- Title: Auditing Fairness by Betting
- Authors: Ben Chugg, Santiago Cortes-Gomez, Bryan Wilder, Aaditya Ramdas
- Abstract summary: We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models.
Our methods are sequential and allow for the continuous monitoring of incoming data.
We demonstrate the efficacy of our approach on three benchmark fairness datasets.
- Score: 47.53732591434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We provide practical, efficient, and nonparametric methods for auditing the
fairness of deployed classification and regression models. Whereas previous
work relies on a fixed-sample size, our methods are sequential and allow for
the continuous monitoring of incoming data, making them highly amenable to
tracking the fairness of real-world systems. We also allow the data to be
collected by a probabilistic policy as opposed to sampled uniformly from the
population. This enables auditing to be conducted on data gathered for another
purpose. Moreover, this policy may change over time and different policies may
be used on different subpopulations. Finally, our methods can handle
distribution shift resulting from either changes to the model or changes in the
underlying population. Our approach is based on recent progress in
anytime-valid inference and game-theoretic statistics-the "testing by betting"
framework in particular. These connections ensure that our methods are
interpretable, fast, and easy to implement. We demonstrate the efficacy of our
approach on three benchmark fairness datasets.
Related papers
- Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.
We derive estimators demographic parity, equal opportunity, and conditional mutual information.
To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Sample and Predict Your Latent: Modality-free Sequential Disentanglement
via Contrastive Estimation [2.7759072740347017]
We introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals.
In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data.
Our method presents state-of-the-art results in comparison to existing techniques.
arXiv Detail & Related papers (2023-05-25T10:50:30Z) - Statistical Inference for Fairness Auditing [4.318555434063274]
We frame this task as "fairness auditing," in terms of multiple hypothesis testing.
We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups.
Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately.
arXiv Detail & Related papers (2023-05-05T17:54:22Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - Adaptive Conformal Inference Under Distribution Shift [0.0]
We develop methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion.
Our framework builds on ideas from conformal inference to provide a general wrapper that can be combined with any black box method.
We test our method, adaptive conformal inference, on two real world datasets and find that its predictions are robust to visible and significant distribution shifts.
arXiv Detail & Related papers (2021-06-01T01:37:32Z) - Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under
Batch Update Policy [8.807587076209566]
The goal of off-policy evaluation (OPE) is to evaluate a new policy using historical data obtained via a behavior policy.
Because the contextual bandit updates the policy based on past observations, the samples are not independent and identically distributed.
This paper tackles this problem by constructing an estimator from a martingale difference sequence (MDS) for the dependent samples.
arXiv Detail & Related papers (2020-10-23T15:22:57Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.