Mitigating Label Bias in Machine Learning: Fairness through Confident
Learning
- URL: http://arxiv.org/abs/2312.08749v2
- Date: Sun, 24 Dec 2023 05:57:12 GMT
- Title: Mitigating Label Bias in Machine Learning: Fairness through Confident
Learning
- Authors: Yixuan Zhang, Boyu Li, Zenan Ling and Feng Zhou
- Abstract summary: Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias.
In this paper, we demonstrate that it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning.
- Score: 22.031325797588476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrimination can occur when the underlying unbiased labels are overwritten
by an agent with potential bias, resulting in biased datasets that unfairly
harm specific groups and cause classifiers to inherit these biases. In this
paper, we demonstrate that despite only having access to the biased labels, it
is possible to eliminate bias by filtering the fairest instances within the
framework of confident learning. In the context of confident learning, low
self-confidence usually indicates potential label errors; however, this is not
always the case. Instances, particularly those from underrepresented groups,
might exhibit low confidence scores for reasons other than labeling errors. To
address this limitation, our approach employs truncation of the confidence
score and extends the confidence interval of the probabilistic threshold.
Additionally, we incorporate with co-teaching paradigm for providing a more
robust and reliable selection of fair instances and effectively mitigating the
adverse effects of biased labels. Through extensive experimentation and
evaluation of various datasets, we demonstrate the efficacy of our approach in
promoting fairness and reducing the impact of label bias in machine learning
models.
Related papers
- Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - More Data Can Lead Us Astray: Active Data Acquisition in the Presence of
Label Bias [7.506786114760462]
Proposed bias mitigation strategies typically overlook the bias presented in the observed labels.
We first present an overview of different types of label bias in the context of supervised learning systems.
We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem.
arXiv Detail & Related papers (2022-07-15T19:30:50Z) - Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing.
We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z) - Gradient Based Activations for Accurate Bias-Free Learning [22.264226961225003]
We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff.
Specifically, this is achieved by using a feature masking approach using the discriminator's gradients.
We show that this simple approach works well to reduce bias as well as improve accuracy significantly.
arXiv Detail & Related papers (2022-02-17T00:30:40Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Fairness-aware Class Imbalanced Learning [57.45784950421179]
We evaluate long-tail learning methods for tweet sentiment and occupation classification.
We extend a margin-loss based approach with methods to enforce fairness.
arXiv Detail & Related papers (2021-09-21T22:16:30Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Bias-Tolerant Fair Classification [20.973916494320246]
label bias and selection bias are two reasons in data that will hinder the fairness of machine-learning outcomes.
We propose a Bias-TolerantFAirRegularizedLoss (B-FARL) which tries to regain the benefits using data affected by label bias and selection bias.
B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required.
arXiv Detail & Related papers (2021-07-07T13:31:38Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Group Fairness by Probabilistic Modeling with Latent Fair Decisions [36.20281545470954]
This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label.
We aim to achieve demographic parity by enforcing certain independencies in the learned model.
We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data.
arXiv Detail & Related papers (2020-09-18T19:13:23Z) - Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? [11.435833538081557]
Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution.
We examine the ability of fairness-constrained ERM to correct this problem.
We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity.
arXiv Detail & Related papers (2019-12-02T22:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.