Fix Representation (Optimally) Before Fairness: Finite-Sample Shrinkage Population Correction and the True Price of Fairness Under Subpopulation Shift
- URL: http://arxiv.org/abs/2602.05707v1
- Date: Thu, 05 Feb 2026 14:32:05 GMT
- Title: Fix Representation (Optimally) Before Fairness: Finite-Sample Shrinkage Population Correction and the True Price of Fairness Under Subpopulation Shift
- Authors: Amir Asiaee, Kaveh Aryan,
- Abstract summary: Machine learning practitioners frequently observe tension between predictive accuracy and group fairness constraints.<n>We show that both phenomena can be artifacts of training data that misrepresents subgroup proportions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning practitioners frequently observe tension between predictive accuracy and group fairness constraints -- yet sometimes fairness interventions appear to improve accuracy. We show that both phenomena can be artifacts of training data that misrepresents subgroup proportions. Under subpopulation shift (stable within-group distributions, shifted group proportions), we establish: (i) full importance-weighted correction is asymptotically unbiased but finite-sample suboptimal; (ii) the optimal finite-sample correction is a shrinkage reweighting that interpolates between target and training mixtures; (iii) apparent "fairness helps accuracy" can arise from comparing fairness methods to an improperly-weighted baseline. We provide an actionable evaluation protocol: fix representation (optimally) before fairness -- compare fairness interventions against a shrinkage-corrected baseline to isolate the true, irreducible price of fairness. Experiments on synthetic and real-world benchmarks (Adult, COMPAS) validate our theoretical predictions and demonstrate that this protocol eliminates spurious tradeoffs, revealing the genuine fairness-utility frontier.
Related papers
- The Statistical Fairness-Accuracy Frontier [50.323024516295725]
Machine learning models must balance accuracy and fairness, but these goals often conflict.<n>A useful tool for understanding this trade-off is the fairness-accuracy frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy.<n>We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them.
arXiv Detail & Related papers (2025-08-25T03:01:35Z) - Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making.
We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z) - Consistent End-to-End Estimation for Counterfactual Fairness [56.9060492313073]
We propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness.<n>We provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness.
arXiv Detail & Related papers (2023-10-26T17:58:39Z) - Understanding Fairness Surrogate Functions in Algorithmic Fairness [21.555040357521907]
We show that there is a surrogate-fairness gap between the fairness definition and the fairness surrogate function.
We elaborate a novel and general algorithm called Balanced Surrogate, which iteratively reduces the gap to mitigate unfairness.
arXiv Detail & Related papers (2023-10-17T12:40:53Z) - Fairness under Covariate Shift: Improving Fairness-Accuracy tradeoff
with few Unlabeled Test Samples [21.144077993862652]
We operate in the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available.
We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines.
We show that our proposed method significantly outperforms them.
arXiv Detail & Related papers (2023-10-11T14:39:51Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - RobustFair: Adversarial Evaluation through Fairness Confusion Directed
Gradient Search [8.278129731168127]
Deep neural networks (DNNs) often face challenges due to their vulnerability to various adversarial perturbations.
This paper introduces a novel approach, RobustFair, to evaluate the accurate fairness of DNNs when subjected to false or biased perturbations.
arXiv Detail & Related papers (2023-05-18T12:07:29Z) - Fair-CDA: Continuous and Directional Augmentation for Group Fairness [48.84385689186208]
We propose a fine-grained data augmentation strategy for imposing fairness constraints.
We show that group fairness can be achieved by regularizing the models on transition paths of sensitive features between groups.
Our proposed method does not assume any data generative model and ensures good generalization for both accuracy and fairness.
arXiv Detail & Related papers (2023-04-01T11:23:00Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - On Comparing Fair Classifiers under Data Bias [42.43344286660331]
We study the effect of varying data biases on the accuracy and fairness of fair classifiers.
Our experiments show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
arXiv Detail & Related papers (2023-02-12T13:04:46Z) - Increasing Fairness via Combination with Learning Guarantees [8.116369626226815]
We propose a fairness quality measure named 'discriminative risk (DR)' to reflect both individual and group fairness aspects.<n>We investigate its properties and establish the first- and second-order oracle bounds to show that fairness can be boosted via ensemble combination with theoretical learning guarantees.
arXiv Detail & Related papers (2023-01-25T20:31:06Z) - How Robust is Your Fairness? Evaluating and Sustaining Fairness under
Unseen Distribution Shifts [107.72786199113183]
We propose a novel fairness learning method termed CUrvature MAtching (CUMA)
CUMA achieves robust fairness generalizable to unseen domains with unknown distributional shifts.
We evaluate our method on three popular fairness datasets.
arXiv Detail & Related papers (2022-07-04T02:37:50Z) - Debiasing classifiers: is reality at variance with expectation? [9.730485257882433]
We show that debiasers often fail in practice to generalize out-of-sample data, and can in fact make fairness worse rather than better.
Considering fairness--performance trade-offs justifies the counterintuitive notion that partial debiasing can actually yield better results in practice on out-of-sample data.
arXiv Detail & Related papers (2020-11-04T17:00:54Z) - Fair Regression with Wasserstein Barycenters [39.818025466204055]
We study the problem of learning a real-valued function that satisfies the Demographic Parity constraint.
It demands the distribution of the predicted output to be independent of the sensitive attribute.
We establish a connection between fair regression and optimal transport theory, based on which we derive a close form expression for the optimal fair predictor.
arXiv Detail & Related papers (2020-06-12T16:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.