Related papers: The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

URL: http://arxiv.org/abs/2012.11448v1
Date: Mon, 21 Dec 2020 16:10:00 GMT
Title: The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective
Authors: Naman Goel, Alfonso Amayuelas, Amit Deshpande, Amit Sharma
Abstract summary: Training datasets for machine learning often have some form of missingness. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data.
Score: 14.622708494548363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.

Related papers

Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself. We derive estimators demographic parity, equal opportunity, and conditional mutual information. To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z)
Towards Harmless Rawlsian Fairness Regardless of Demographic Prior [57.30787578956235]
We explore the potential for achieving fairness without compromising its utility when no prior demographics are provided to the training set. We propose a simple but effective method named VFair to minimize the variance of training losses inside the optimal set of empirical losses.
arXiv Detail & Related papers (2024-11-04T12:40:34Z)
SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [49.94607673097326]
We propose a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization algorithm. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios.
arXiv Detail & Related papers (2024-02-21T03:39:04Z)
Fairness Uncertainty Quantification: How certain are you that the model is fair? [13.209748908186606]
In modern machine learning, Gradient Descent (SGD) type algorithms are almost always used as training algorithms implying that the learned model, and consequently, its fairness properties are random. In this work we provide Confidence Interval (CI) for test unfairness when a group-fairness-aware, specifically, Disparate Impact (DI), and Disparate Mistreatment (DM) aware linear binary classifier is trained using online SGD-type algorithms.
arXiv Detail & Related papers (2023-04-27T04:07:58Z)
On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning [13.149070833843133]
Machine unlearning, i.e. having a model forget about some of its training data, has become increasingly important as privacy legislation promotes variants of the right-to-be-forgotten. We first show that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets. We then turn to exact unlearning approaches and ask how to verify their claims of unlearning.
arXiv Detail & Related papers (2021-10-22T16:16:56Z)
Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications. To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently. Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes. Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements. A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z)
Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values [4.973456986972679]
We investigate the fairness concerns of training a machine learning model using data with missing values. We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
arXiv Detail & Related papers (2021-09-21T20:46:22Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Metrics and methods for a systematic comparison of fairness-aware machine learning algorithms [0.0]
This study is the most comprehensive of its kind. It considers fairness, predictive-performance, calibration quality, and speed of 28 different modelling pipelines. We also found that fairness-aware algorithms can induce fairness without material drops in predictive power.
arXiv Detail & Related papers (2020-10-08T13:58:09Z)
Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data. A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness [7.673007415383724]
We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks. We have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance.
arXiv Detail & Related papers (2020-05-21T23:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.