On Fairness and Stability: Is Estimator Variance a Friend or a Foe?
- URL: http://arxiv.org/abs/2302.04525v1
- Date: Thu, 9 Feb 2023 09:35:36 GMT
- Title: On Fairness and Stability: Is Estimator Variance a Friend or a Foe?
- Authors: Falaah Arif Khan, Denys Herasymuk, Julia Stoyanovich
- Abstract summary: We propose a new family of performance measures based on group-wise parity in variance.
We develop and release an open-source library that reconciles uncertainty quantification techniques with fairness analysis.
- Score: 6.751310968561177
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The error of an estimator can be decomposed into a (statistical) bias term, a
variance term, and an irreducible noise term. When we do bias analysis,
formally we are asking the question: "how good are the predictions?" The role
of bias in the error decomposition is clear: if we trust the labels/targets,
then we would want the estimator to have as low bias as possible, in order to
minimize error. Fair machine learning is concerned with the question: "Are the
predictions equally good for different demographic/social groups?" This has
naturally led to a variety of fairness metrics that compare some measure of
statistical bias on subsets corresponding to socially privileged and socially
disadvantaged groups. In this paper we propose a new family of performance
measures based on group-wise parity in variance. We demonstrate when group-wise
statistical bias analysis gives an incomplete picture, and what group-wise
variance analysis can tell us in settings that differ in the magnitude of
statistical bias. We develop and release an open-source library that reconciles
uncertainty quantification techniques with fairness analysis, and use it to
conduct an extensive empirical analysis of our variance-based fairness measures
on standard benchmarks.
Related papers
- CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk [7.755784217796677]
We propose CLEAR, a calibration method with two distinct parameters.<n>We show how it can be used with (i) quantile regression for aleatoric uncertainty and (ii) ensembles drawn from the Predictability-Computability-Stability framework.<n> CLEAR achieves an average improvement of 28.2% and 17.4% in the interval width compared to the two individually calibrated baselines.
arXiv Detail & Related papers (2025-07-10T20:13:00Z) - Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty [1.6112718683989882]
We make use of a more fine-grained taxonomy of epistemic uncertainty sources in machine learning models.<n>We show that high model bias can lead to misleadingly low estimates of epistemic uncertainty.<n>Common second-order uncertainty methods systematically blur bias-induced errors into aleatoric estimates.
arXiv Detail & Related papers (2025-05-29T14:50:46Z) - Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation [0.18846515534317262]
This study presents a machine learning based classification framework that integrates feature selection with data augmentation techniques.<n>We show that the proposed pipeline yields cross validated perfomance on small dataset.
arXiv Detail & Related papers (2025-05-06T10:09:50Z) - Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching [0.49157446832511503]
We investigate whether the way training and testing data are sampled affects the reliability of fairness metrics.
Since training and test sets are often randomly sampled from the same population, bias present in the training data may still exist in the test data.
We propose FairMatch, a post-processing method that applies propensity score matching to evaluate and mitigate bias.
arXiv Detail & Related papers (2025-04-23T19:28:30Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - Towards Better Certified Segmentation via Diffusion Models [62.21617614504225]
segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving.
Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees.
In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models.
arXiv Detail & Related papers (2023-06-16T16:30:39Z) - The Decaying Missing-at-Random Framework: Model Doubly Robust Causal Inference with Partially Labeled Data [8.916614661563893]
We introduce a missing-at-random (decaying MAR) framework and associated approaches for doubly robust causal inference.<n>This simultaneously addresses selection bias in the labeling mechanism and the extreme imbalance between labeled and unlabeled groups.<n>To ensure robust causal conclusions, we propose a bias-reduced SS estimator for the average treatment effect.
arXiv Detail & Related papers (2023-05-22T07:37:12Z) - Arbitrariness and Social Prediction: The Confounding Role of Variance in
Fair Classification [31.392067805022414]
Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification.
In practice, the variance on some data examples is so large that decisions can be effectively arbitrary.
We develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary.
arXiv Detail & Related papers (2023-01-27T06:52:04Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Evaluating Aleatoric Uncertainty via Conditional Generative Models [15.494774321257939]
We study conditional generative models for aleatoric uncertainty estimation.
We introduce two metrics to measure the discrepancy between two conditional distributions.
We demonstrate numerically how our metrics provide correct measurements of conditional distributional discrepancies.
arXiv Detail & Related papers (2022-06-09T05:39:04Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Model Mis-specification and Algorithmic Bias [0.0]
Machine learning algorithms are increasingly used to inform critical decisions.
There is a growing concern about bias, that algorithms may produce uneven outcomes for individuals in different demographic groups.
In this work, we measure bias as the difference between mean prediction errors across groups.
arXiv Detail & Related papers (2021-05-31T17:45:12Z) - Counterfactual Invariance to Spurious Correlations: Why and How to Pass
Stress Tests [87.60900567941428]
A spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter.
In machine learning, these have a know-it-when-you-see-it character.
We study stress testing using the tools of causal inference.
arXiv Detail & Related papers (2021-05-31T14:39:38Z) - The statistical advantage of automatic NLG metrics at the system level [10.540821585237222]
Statistically, humans are unbiased, high variance estimators, while metrics are biased, low variance estimators.
We compare these estimators by their error in pairwise prediction (which generation system is better?) using the bootstrap.
Our analysis compares the adjusted error of metrics to humans and a derived, perfect segment-level annotator, both of which are unbiased estimators dependent on the number of judgments collected.
arXiv Detail & Related papers (2021-05-26T09:53:57Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z) - Is Your Classifier Actually Biased? Measuring Fairness under Uncertainty
with Bernstein Bounds [21.598196899084268]
We use Bernstein bounds to represent uncertainty about the bias estimate as a confidence interval.
We provide empirical evidence that a 95% confidence interval consistently bounds the true bias.
Our findings suggest that the datasets currently used to measure bias are too small to conclusively identify bias except in the most egregious cases.
arXiv Detail & Related papers (2020-04-26T09:45:45Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? [11.435833538081557]
Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution.
We examine the ability of fairness-constrained ERM to correct this problem.
We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity.
arXiv Detail & Related papers (2019-12-02T22:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.