Related papers: Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

URL: http://arxiv.org/abs/2312.08749v2
Date: Sun, 24 Dec 2023 05:57:12 GMT
Title: Mitigating Label Bias in Machine Learning: Fairness through Confident Learning
Authors: Yixuan Zhang, Boyu Li, Zenan Ling and Feng Zhou
Abstract summary: Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias. In this paper, we demonstrate that it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning.
Score: 22.031325797588476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias, resulting in biased datasets that unfairly harm specific groups and cause classifiers to inherit these biases. In this paper, we demonstrate that despite only having access to the biased labels, it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning. In the context of confident learning, low self-confidence usually indicates potential label errors; however, this is not always the case. Instances, particularly those from underrepresented groups, might exhibit low confidence scores for reasons other than labeling errors. To address this limitation, our approach employs truncation of the confidence score and extends the confidence interval of the probabilistic threshold. Additionally, we incorporate with co-teaching paradigm for providing a more robust and reliable selection of fair instances and effectively mitigating the adverse effects of biased labels. Through extensive experimentation and evaluation of various datasets, we demonstrate the efficacy of our approach in promoting fairness and reducing the impact of label bias in machine learning models.

Related papers

Bias-Aware Mislabeling Detection via Decoupled Confident Learning [12.45833130404355]
We propose Decoupled Confident Learning (DeCoLe) to detect mislabeled instances in datasets affected by label bias.<n>DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection.<n>Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices.
arXiv Detail & Related papers (2025-07-09T18:44:36Z)
Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching [0.49157446832511503]
We investigate whether the way training and testing data are sampled affects the reliability of fairness metrics. Since training and test sets are often randomly sampled from the same population, bias present in the training data may still exist in the test data. We propose FairMatch, a post-processing method that applies propensity score matching to evaluate and mitigate bias.
arXiv Detail & Related papers (2025-04-23T19:28:30Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias [7.506786114760462]
Proposed bias mitigation strategies typically overlook the bias presented in the observed labels. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem.
arXiv Detail & Related papers (2022-07-15T19:30:50Z)
Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing. We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z)
Gradient Based Activations for Accurate Bias-Free Learning [22.264226961225003]
We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff. Specifically, this is achieved by using a feature masking approach using the discriminator's gradients. We show that this simple approach works well to reduce bias as well as improve accuracy significantly.
arXiv Detail & Related papers (2022-02-17T00:30:40Z)
Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z)
Fairness-aware Class Imbalanced Learning [57.45784950421179]
We evaluate long-tail learning methods for tweet sentiment and occupation classification. We extend a margin-loss based approach with methods to enforce fairness.
arXiv Detail & Related papers (2021-09-21T22:16:30Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Bias-Tolerant Fair Classification [20.973916494320246]
label bias and selection bias are two reasons in data that will hinder the fairness of machine-learning outcomes. We propose a Bias-TolerantFAirRegularizedLoss (B-FARL) which tries to regain the benefits using data affected by label bias and selection bias. B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required.
arXiv Detail & Related papers (2021-07-07T13:31:38Z)
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
Group Fairness by Probabilistic Modeling with Latent Fair Decisions [36.20281545470954]
This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. We aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data.
arXiv Detail & Related papers (2020-09-18T19:13:23Z)
Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? [11.435833538081557]
Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity.
arXiv Detail & Related papers (2019-12-02T22:00:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.