FairBalance: How to Achieve Equalized Odds With Data Pre-processing
- URL: http://arxiv.org/abs/2107.08310v4
- Date: Wed, 26 Apr 2023 13:48:17 GMT
- Title: FairBalance: How to Achieve Equalized Odds With Data Pre-processing
- Authors: Zhe Yu, Joymallya Chakraborty, Tim Menzies
- Abstract summary: This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software.
We proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data.
- Score: 32.962227796351776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research seeks to benefit the software engineering society by providing
a simple yet effective pre-processing approach to achieve equalized odds
fairness in machine learning software. Fairness issues have attracted
increasing attention since machine learning software is increasingly used for
high-stakes and high-risk decisions. Amongst all the existing fairness notions,
this work specifically targets "equalized odds" given its advantage in always
allowing perfect classifiers. Equalized odds requires that members of every
demographic group do not receive disparate mistreatment. Prior works either
optimize for an equalized odds related metric during the learning process like
a black-box, or manipulate the training data following some intuition. This
work studies the root cause of the violation of equalized odds and how to
tackle it. We found that equalizing the class distribution in each demographic
group with sample weights is a necessary condition for achieving equalized odds
without modifying the normal training process. In addition, an important
partial condition for equalized odds (zero average odds difference) can be
guaranteed when the class distributions are weighted to be not only equal but
also balanced (1:1). Based on these analyses, we proposed FairBalance, a
pre-processing algorithm which balances the class distribution in each
demographic group by assigning calculated weights to the training data. On
eight real-world datasets, our empirical results show that, at low
computational overhead, the proposed pre-processing algorithm FairBalance can
significantly improve equalized odds without much, if any damage to the
utility. FairBalance also outperforms existing state-of-the-art approaches in
terms of equalized odds. To facilitate reuse, reproduction, and validation, we
made our scripts available at https://github.com/hil-se/FairBalance.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Boosting Fair Classifier Generalization through Adaptive Priority Reweighing [59.801444556074394]
A performance-promising fair algorithm with better generalizability is needed.
This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability.
arXiv Detail & Related papers (2023-09-15T13:04:55Z) - FORML: Learning to Reweight Data for Fairness [2.105564340986074]
We introduce Fairness Optimized Reweighting via Meta-Learning (FORML)
FORML balances fairness constraints and accuracy by jointly optimizing training sample weights and a neural network's parameters.
We show that FORML improves equality of opportunity fairness criteria over existing state-of-the-art reweighting methods by approximately 1% on image classification tasks and by approximately 5% on a face prediction task.
arXiv Detail & Related papers (2022-02-03T17:36:07Z) - Parity-based Cumulative Fairness-aware Boosting [7.824964622317634]
Data-driven AI systems can lead to discrimination on the basis of protected attributes like gender or race.
We propose AdaFair, a fairness-aware boosting ensemble that changes the data distribution at each round.
Our experiments show that our approach can achieve parity in terms of statistical parity, equal opportunity, and disparate mistreatment.
arXiv Detail & Related papers (2022-01-04T14:16:36Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z) - A Distributionally Robust Approach to Fair Classification [17.759493152879013]
We propose a robust logistic regression model with an unfairness penalty that prevents discrimination with respect to sensitive attributes such as gender or ethnicity.
This model is equivalent to a tractable convex optimization problem if a Wasserstein ball centered at the empirical distribution on the training data is used to model distributional uncertainty.
We demonstrate that the resulting classifier improves fairness at a marginal loss of predictive accuracy on both synthetic and real datasets.
arXiv Detail & Related papers (2020-07-18T22:34:48Z) - Ensuring Fairness Beyond the Training Data [22.284777913437182]
We develop classifiers that are fair with respect to the training distribution and for a class of perturbations.
Based on online learning algorithm, we develop an iterative algorithm that converges to a fair and robust solution.
Our experiments show that there is an inherent trade-off between fairness and accuracy of such classifiers.
arXiv Detail & Related papers (2020-07-12T16:20:28Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.