The Importance of Modeling Data Missingness in Algorithmic Fairness: A
Causal Perspective
- URL: http://arxiv.org/abs/2012.11448v1
- Date: Mon, 21 Dec 2020 16:10:00 GMT
- Title: The Importance of Modeling Data Missingness in Algorithmic Fairness: A
Causal Perspective
- Authors: Naman Goel, Alfonso Amayuelas, Amit Deshpande, Amit Sharma
- Abstract summary: Training datasets for machine learning often have some form of missingness.
This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed.
We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data.
- Score: 14.622708494548363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training datasets for machine learning often have some form of missingness.
For example, to learn a model for deciding whom to give a loan, the available
training data includes individuals who were given a loan in the past, but not
those who were not. This missingness, if ignored, nullifies any fairness
guarantee of the training procedure when the model is deployed. Using causal
graphs, we characterize the missingness mechanisms in different real-world
scenarios. We show conditions under which various distributions, used in
popular fairness algorithms, can or can not be recovered from the training
data. Our theoretical results imply that many of these algorithms can not
guarantee fairness in practice. Modeling missingness also helps to identify
correct design principles for fair algorithms. For example, in multi-stage
settings where decisions are made in multiple screening rounds, we use our
framework to derive the minimal distributions required to design a fair
algorithm. Our proposed algorithm decentralizes the decision-making process and
still achieves similar performance to the optimal algorithm that requires
centralization and non-recoverable distributions.
Related papers
- Towards Harmless Rawlsian Fairness Regardless of Demographic Prior [57.30787578956235]
We explore the potential for achieving fairness without compromising its utility when no prior demographics are provided to the training set.
We propose a simple but effective method named VFair to minimize the variance of training losses inside the optimal set of empirical losses.
arXiv Detail & Related papers (2024-11-04T12:40:34Z) - SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [49.94607673097326]
We propose a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data.
Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization algorithm.
Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios.
arXiv Detail & Related papers (2024-02-21T03:39:04Z) - Fairness Uncertainty Quantification: How certain are you that the model
is fair? [13.209748908186606]
In modern machine learning, Gradient Descent (SGD) type algorithms are almost always used as training algorithms implying that the learned model, and consequently, its fairness properties are random.
In this work we provide Confidence Interval (CI) for test unfairness when a group-fairness-aware, specifically, Disparate Impact (DI), and Disparate Mistreatment (DM) aware linear binary classifier is trained using online SGD-type algorithms.
arXiv Detail & Related papers (2023-04-27T04:07:58Z) - On the Necessity of Auditable Algorithmic Definitions for Machine
Unlearning [13.149070833843133]
Machine unlearning, i.e. having a model forget about some of its training data, has become increasingly important as privacy legislation promotes variants of the right-to-be-forgotten.
We first show that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets.
We then turn to exact unlearning approaches and ask how to verify their claims of unlearning.
arXiv Detail & Related papers (2021-10-22T16:16:56Z) - Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications.
To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently.
Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes.
Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements.
A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z) - Fairness without Imputation: A Decision Tree Approach for Fair
Prediction with Missing Values [4.973456986972679]
We investigate the fairness concerns of training a machine learning model using data with missing values.
We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning.
We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
arXiv Detail & Related papers (2021-09-21T20:46:22Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Metrics and methods for a systematic comparison of fairness-aware
machine learning algorithms [0.0]
This study is the most comprehensive of its kind.
It considers fairness, predictive-performance, calibration quality, and speed of 28 different modelling pipelines.
We also found that fairness-aware algorithms can induce fairness without material drops in predictive power.
arXiv Detail & Related papers (2020-10-08T13:58:09Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias?
An Empirical Study on Model Fairness [7.673007415383724]
We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks.
We have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance.
arXiv Detail & Related papers (2020-05-21T23:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.