De-biasing "bias" measurement
- URL: http://arxiv.org/abs/2205.05770v1
- Date: Wed, 11 May 2022 20:51:57 GMT
- Title: De-biasing "bias" measurement
- Authors: Kristian Lum, Yunfeng Zhang, Amanda Bower
- Abstract summary: We show that metrics used to measure group-wise model performance disparities are themselves statistically biased estimators of the underlying quantities they purport to represent.
We propose the "double-corrected" variance estimator, which provides unbiased estimates and uncertainty quantification of the variance of model performance across groups.
- Score: 20.049916973204102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a model's performance differs across socially or culturally relevant
groups--like race, gender, or the intersections of many such groups--it is
often called "biased." While much of the work in algorithmic fairness over the
last several years has focused on developing various definitions of model
fairness (the absence of group-wise model performance disparities) and
eliminating such "bias," much less work has gone into rigorously measuring it.
In practice, it important to have high quality, human digestible measures of
model performance disparities and associated uncertainty quantification about
them that can serve as inputs into multi-faceted decision-making processes. In
this paper, we show both mathematically and through simulation that many of the
metrics used to measure group-wise model performance disparities are themselves
statistically biased estimators of the underlying quantities they purport to
represent. We argue that this can cause misleading conclusions about the
relative group-wise model performance disparities along different dimensions,
especially in cases where some sensitive variables consist of categories with
few members. We propose the "double-corrected" variance estimator, which
provides unbiased estimates and uncertainty quantification of the variance of
model performance across groups. It is conceptually simple and easily
implementable without statistical software package or numerical optimization.
We demonstrate the utility of this approach through simulation and show on a
real dataset that while statistically biased estimators of model group-wise
model performance disparities indicate statistically significant between-group
model performance disparities, when accounting for statistical bias in the
estimator, the estimated group-wise disparities in model performance are no
longer statistically significant.
Related papers
- Comparing Fairness of Generative Mobility Models [3.699135947901772]
This work examines the fairness of generative mobility models, addressing the often overlooked dimension of equity in model performance across geographic regions.
Predictive models built on crowd flow data are instrumental in understanding urban structures and movement patterns.
We propose a novel framework for assessing fairness by measuring utility and equity of generated traces.
arXiv Detail & Related papers (2024-11-07T06:01:12Z) - The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models [22.75594773147521]
We introduce Rank-Allocation-Based Bias Index (RABBI), a model-agnostic bias measure that assesses potential allocational harms arising from biases in large language models (LLMs)
Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes.
Our work highlights the need to account for how models are used in contexts with limited resource constraints.
arXiv Detail & Related papers (2024-08-02T14:13:06Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Think Twice: Measuring the Efficiency of Eliminating Prediction
Shortcuts of Question Answering Models [3.9052860539161918]
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature.
We assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA)
We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features.
arXiv Detail & Related papers (2023-05-11T14:35:00Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Simplicity Bias Leads to Amplified Performance Disparities [8.60453031364566]
We show that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class.
A model may prioritize any class or group of the dataset that it finds simple-at the expense of what it finds complex.
arXiv Detail & Related papers (2022-12-13T15:24:41Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - Expected Validation Performance and Estimation of a Random Variable's
Maximum [48.83713377993604]
We analyze three statistical estimators for expected validation performance.
We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias.
We find that the two biased estimators lead to the fewest incorrect conclusions.
arXiv Detail & Related papers (2021-10-01T18:48:47Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.