Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination
- URL: http://arxiv.org/abs/2009.12040v1
- Date: Fri, 25 Sep 2020 05:48:56 GMT
- Title: Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination
- Authors: Tao Zhang, Tianqing Zhu, Jing Li, Mengde Han, Wanlei Zhou, and Philip
S. Yu
- Abstract summary: A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
- Score: 53.3082498402884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A growing specter in the rise of machine learning is whether the decisions
made by machine learning models are fair. While research is already underway to
formalize a machine-learning concept of fairness and to design frameworks for
building fair models with sacrifice in accuracy, most are geared toward either
supervised or unsupervised learning. Yet two observations inspired us to wonder
whether semi-supervised learning might be useful to solve discrimination
problems. First, previous study showed that increasing the size of the training
set may lead to a better trade-off between fairness and accuracy. Second, the
most powerful models today require an enormous of data to train which, in
practical terms, is likely possible from a combination of labeled and unlabeled
data. Hence, in this paper, we present a framework of fair semi-supervised
learning in the pre-processing phase, including pseudo labeling to predict
labels for unlabeled data, a re-sampling method to obtain multiple fair
datasets and lastly, ensemble learning to improve accuracy and decrease
discrimination. A theoretical decomposition analysis of bias, variance and
noise highlights the different sources of discrimination and the impact they
have on fairness in semi-supervised learning. A set of experiments on
real-world and synthetic datasets show that our method is able to use unlabeled
data to achieve a better trade-off between accuracy and discrimination.
Related papers
- Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance.
In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem.
We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z) - Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications.
In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence.
Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z) - Adversarial Stacked Auto-Encoders for Fair Representation Learning [1.061960673667643]
We propose a new fair representation learning approach that leverages different levels of representation of data to tighten the fairness bounds of the learned representation.
Our results show that stacking different auto-encoders and enforcing fairness at different latent spaces result in an improvement of fairness compared to other existing approaches.
arXiv Detail & Related papers (2021-07-27T13:49:18Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z) - On Adversarial Bias and the Robustness of Fair Machine Learning [11.584571002297217]
We show that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness.
An adversary who can control sampling or labeling for a fraction of training data, can reduce the test accuracy significantly beyond what he can achieve on unconstrained models.
We analyze the robustness of fair machine learning through an empirical evaluation of attacks on multiple algorithms and benchmark datasets.
arXiv Detail & Related papers (2020-06-15T18:17:44Z) - Fairness-Aware Learning with Prejudice Free Representations [2.398608007786179]
We propose a novel algorithm that can effectively identify and treat latent discriminating features.
The approach helps to collect discrimination-free features that would improve the model performance.
arXiv Detail & Related papers (2020-02-26T10:06:31Z) - Learning from Discriminatory Training Data [2.1869017389979266]
Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups.
We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets.
arXiv Detail & Related papers (2019-12-17T18:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.