Mitigating Label Bias via Decoupled Confident Learning
- URL: http://arxiv.org/abs/2307.08945v2
- Date: Fri, 29 Sep 2023 16:06:05 GMT
- Title: Mitigating Label Bias via Decoupled Confident Learning
- Authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
- Abstract summary: Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias.
bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation.
We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias.
- Score: 14.001915875687862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Growing concerns regarding algorithmic fairness have led to a surge in
methodologies to mitigate algorithmic bias. However, such methodologies largely
assume that observed labels in training data are correct. This is problematic
because bias in labels is pervasive across important domains, including
healthcare, hiring, and content moderation. In particular, human-generated
labels are prone to encoding societal biases. While the presence of labeling
bias has been discussed conceptually, there is a lack of methodologies to
address this problem. We propose a pruning method -- Decoupled Confident
Learning (DeCoLe) -- specifically designed to mitigate label bias. After
illustrating its performance on a synthetic dataset, we apply DeCoLe in the
context of hate speech detection, where label bias has been recognized as an
important challenge, and show that it successfully identifies biased labels and
outperforms competing approaches.
Related papers
- Partial-Label Regression [54.74984751371617]
Partial-label learning is a weakly supervised learning setting that allows each training example to be annotated with a set of candidate labels.
Previous studies on partial-label learning only focused on the classification setting where candidate labels are all discrete.
In this paper, we provide the first attempt to investigate partial-label regression, where each training example is annotated with a set of real-valued candidate labels.
arXiv Detail & Related papers (2023-06-15T09:02:24Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Fairness and Bias in Truth Discovery Algorithms: An Experimental
Analysis [7.575734557466221]
Crowd workers may sometimes provide unreliable labels.
Truth discovery (TD) algorithms are applied to determine the consensus labels from conflicting worker responses.
We conduct a systematic study of the bias and fairness of TD algorithms.
arXiv Detail & Related papers (2023-04-25T04:56:35Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - De-biased Representation Learning for Fairness with Unreliable Labels [22.794504690957414]
We propose a textbfDe-textbfBiased textbfRepresentation Learning for textbfFairness (DBRF) framework.
We formulate the de-biased learning framework through information-theoretic concepts such as mutual information and information bottleneck.
Experiment results over both synthetic and real-world data demonstrate that DBRF effectively learns de-biased representations towards ideal labels.
arXiv Detail & Related papers (2022-08-01T07:16:40Z) - More Data Can Lead Us Astray: Active Data Acquisition in the Presence of
Label Bias [7.506786114760462]
Proposed bias mitigation strategies typically overlook the bias presented in the observed labels.
We first present an overview of different types of label bias in the context of supervised learning systems.
We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem.
arXiv Detail & Related papers (2022-07-15T19:30:50Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Learning with Proper Partial Labels [87.65718705642819]
Partial-label learning is a kind of weakly-supervised learning with inexact labels.
We show that this proper partial-label learning framework includes many previous partial-label learning settings.
We then derive a unified unbiased estimator of the classification risk.
arXiv Detail & Related papers (2021-12-23T01:37:03Z) - Confident in the Crowd: Bayesian Inference to Improve Data Labelling in
Crowdsourcing [0.30458514384586394]
We present new techniques to improve the quality of the labels while attempting to reduce the cost.
This paper investigates the use of more sophisticated methods, such as Bayesian inference, to measure the performance of the labellers.
Our methods outperform the standard voting methods in both cost and accuracy while maintaining higher reliability when there is disagreement within the crowd.
arXiv Detail & Related papers (2021-05-28T17:09:45Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.