Mitigating Source Bias for Fairer Weak Supervision
- URL: http://arxiv.org/abs/2303.17713v3
- Date: Wed, 29 Nov 2023 18:10:41 GMT
- Title: Mitigating Source Bias for Fairer Weak Supervision
- Authors: Changho Shin, Sonia Cromp, Dyah Adila, Frederic Sala
- Abstract summary: Weak supervision enables efficient development of training sets by reducing the need for ground truth labels.
We show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.
A simple extension of our method aimed at maximizing performance produces state-of-the-art performance in five out of ten datasets in the WRENCH benchmark.
- Score: 13.143596481809508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weak supervision enables efficient development of training sets by reducing
the need for ground truth labels. However, the techniques that make weak
supervision attractive -- such as integrating any source of signal to estimate
unknown labels -- also entail the danger that the produced pseudolabels are
highly biased. Surprisingly, given everyday use and the potential for increased
bias, weak supervision has not been studied from the point of view of fairness.
We begin such a study, starting with the observation that even when a fair
model can be built from a dataset with access to ground-truth labels, the
corresponding dataset labeled via weak supervision can be arbitrarily unfair.
To address this, we propose and empirically validate a model for source
unfairness in weak supervision, then introduce a simple counterfactual
fairness-based technique that can mitigate these biases. Theoretically, we show
that it is possible for our approach to simultaneously improve both accuracy
and fairness -- in contrast to standard fairness approaches that suffer from
tradeoffs. Empirically, we show that our technique improves accuracy on weak
supervision baselines by as much as 32\% while reducing demographic parity gap
by 82.5\%. A simple extension of our method aimed at maximizing performance
produces state-of-the-art performance in five out of ten datasets in the WRENCH
benchmark.
Related papers
- Fair Bilevel Neural Network (FairBiNN): On Balancing fairness and accuracy via Stackelberg Equilibrium [0.3350491650545292]
Current methods for mitigating bias often result in information loss and an inadequate balance between accuracy and fairness.
We propose a novel methodology grounded in bilevel optimization principles.
Our deep learning-based approach concurrently optimize for both accuracy and fairness objectives.
arXiv Detail & Related papers (2024-10-21T18:53:39Z) - Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Mitigating Label Bias in Machine Learning: Fairness through Confident
Learning [22.031325797588476]
Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias.
In this paper, we demonstrate that it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning.
arXiv Detail & Related papers (2023-12-14T08:55:38Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - Fair-CDA: Continuous and Directional Augmentation for Group Fairness [48.84385689186208]
We propose a fine-grained data augmentation strategy for imposing fairness constraints.
We show that group fairness can be achieved by regularizing the models on transition paths of sensitive features between groups.
Our proposed method does not assume any data generative model and ensures good generalization for both accuracy and fairness.
arXiv Detail & Related papers (2023-04-01T11:23:00Z) - On Comparing Fair Classifiers under Data Bias [42.43344286660331]
We study the effect of varying data biases on the accuracy and fairness of fair classifiers.
Our experiments show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
arXiv Detail & Related papers (2023-02-12T13:04:46Z) - Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z) - Fair Densities via Boosting the Sufficient Statistics of Exponential
Families [72.34223801798422]
We introduce a boosting algorithm to pre-process data for fairness.
Our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee.
Empirical results are present to display the quality of result on real-world data.
arXiv Detail & Related papers (2020-12-01T00:49:17Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.