Can We Achieve Fairness Using Semi-Supervised Learning?
- URL: http://arxiv.org/abs/2111.02038v2
- Date: Thu, 4 Nov 2021 05:26:47 GMT
- Title: Can We Achieve Fairness Using Semi-Supervised Learning?
- Authors: Joymallya Chakraborty, Huy Tu, Suvodeep Majumder, Tim Menzies
- Abstract summary: We use semi-supervised techniques to create fair classification models.
Our framework, Fair-SSL, takes a very small amount of labeled data as input and generates pseudo-labels for the unlabeled data.
Fair-SSL achieves similar performance as three state-of-the-art bias mitigation algorithms.
- Score: 13.813788753789428
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Ethical bias in machine learning models has become a matter of concern in the
software engineering community. Most of the prior software engineering works
concentrated on finding ethical bias in models rather than fixing it. After
finding bias, the next step is mitigation. Prior researchers mainly tried to
use supervised approaches to achieve fairness. However, in the real world,
getting data with trustworthy ground truth is challenging and also ground truth
can contain human bias. Semi-supervised learning is a machine learning
technique where, incrementally, labeled data is used to generate pseudo-labels
for the rest of the data (and then all that data is used for model training).
In this work, we apply four popular semi-supervised techniques as
pseudo-labelers to create fair classification models. Our framework, Fair-SSL,
takes a very small amount (10%) of labeled data as input and generates
pseudo-labels for the unlabeled data. We then synthetically generate new data
points to balance the training data based on class and protected attribute as
proposed by Chakraborty et al. in FSE 2021. Finally, the classification model
is trained on the balanced pseudo-labeled data and validated on test data.
After experimenting on ten datasets and three learners, we find that Fair-SSL
achieves similar performance as three state-of-the-art bias mitigation
algorithms. That said, the clear advantage of Fair-SSL is that it requires only
10% of the labeled training data. To the best of our knowledge, this is the
first SE work where semi-supervised techniques are used to fight against
ethical bias in SE ML models.
Related papers
- Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training.
Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Boosting Semi-Supervised Learning by bridging high and low-confidence
predictions [4.18804572788063]
Pseudo-labeling is a crucial technique in semi-supervised learning (SSL)
We propose a new method called ReFixMatch, which aims to utilize all of the unlabeled data during training.
arXiv Detail & Related papers (2023-08-15T00:27:18Z) - On Comparing Fair Classifiers under Data Bias [42.43344286660331]
We study the effect of varying data biases on the accuracy and fairness of fair classifiers.
Our experiments show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
arXiv Detail & Related papers (2023-02-12T13:04:46Z) - Debiased Learning from Naturally Imbalanced Pseudo-Labels for Zero-Shot
and Semi-Supervised Learning [27.770473405635585]
This work studies the bias issue of pseudo-labeling, a natural phenomenon that widely occurs but often overlooked by prior research.
We observe heavy long-tailed pseudo-labels when a semi-supervised learning model FixMatch predicts labels on the unlabeled set even though the unlabeled data is curated to be balanced.
Without intervention, the training model inherits the bias from the pseudo-labels and end up being sub-optimal.
arXiv Detail & Related papers (2022-01-05T07:40:24Z) - Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To
Reduce Model Bias [10.639605996067534]
Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy.
In COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction in favor of men.
We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class.
arXiv Detail & Related papers (2021-10-20T06:00:03Z) - BiFair: Training Fair Models with Bilevel Optimization [8.2509884277533]
We develop a new training algorithm, named BiFair, which jointly minimizes for a utility, and a fairness loss of interest.
Our algorithm consistently performs better, i.e., we reach to better values of a given fairness metric under same, or higher accuracy.
arXiv Detail & Related papers (2021-06-03T22:36:17Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z) - Leveraging Semi-Supervised Learning for Fairness using Neural Networks [49.604038072384995]
There has been a growing concern about the fairness of decision-making systems based on machine learning.
In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data.
The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
arXiv Detail & Related papers (2019-12-31T09:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.