Fairness without Imputation: A Decision Tree Approach for Fair
Prediction with Missing Values
- URL: http://arxiv.org/abs/2109.10431v1
- Date: Tue, 21 Sep 2021 20:46:22 GMT
- Title: Fairness without Imputation: A Decision Tree Approach for Fair
Prediction with Missing Values
- Authors: Haewon Jeong, Hao Wang, Flavio P. Calmon
- Abstract summary: We investigate the fairness concerns of training a machine learning model using data with missing values.
We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning.
We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
- Score: 4.973456986972679
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the fairness concerns of training a machine learning model
using data with missing values. Even though there are a number of fairness
intervention methods in the literature, most of them require a complete
training set as input. In practice, data can have missing values, and data
missing patterns can depend on group attributes (e.g. gender or race). Simply
applying off-the-shelf fair learning algorithms to an imputed dataset may lead
to an unfair model. In this paper, we first theoretically analyze different
sources of discrimination risks when training with an imputed dataset. Then, we
propose an integrated approach based on decision trees that does not require a
separate process of imputation and learning. Instead, we train a tree with
missing incorporated as attribute (MIA), which does not require explicit
imputation, and we optimize a fairness-regularized objective function. We
demonstrate that our approach outperforms existing fairness intervention
methods applied to an imputed dataset, through several experiments on
real-world datasets.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Certifying Robustness to Programmable Data Bias in Decision Trees [12.060443368097102]
We certify that models produced by a learning learner are pointwise-robust to potential dataset biases.
Our approach allows specifying bias models across a variety of dimensions.
We evaluate our approach on datasets commonly used in the fairness literature.
arXiv Detail & Related papers (2021-10-08T20:15:17Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z) - The Importance of Modeling Data Missingness in Algorithmic Fairness: A
Causal Perspective [14.622708494548363]
Training datasets for machine learning often have some form of missingness.
This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed.
We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data.
arXiv Detail & Related papers (2020-12-21T16:10:00Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - On the consistency of supervised learning with missing values [15.666860186278782]
In many application settings, the data have missing entries which make analysis challenging.
Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data.
We show that the widely-used method of imputing with a constant, such as the mean prior to learning, is consistent when missing values are not informative.
arXiv Detail & Related papers (2019-02-19T07:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.