More Data Can Lead Us Astray: Active Data Acquisition in the Presence of
Label Bias
- URL: http://arxiv.org/abs/2207.07723v1
- Date: Fri, 15 Jul 2022 19:30:50 GMT
- Title: More Data Can Lead Us Astray: Active Data Acquisition in the Presence of
Label Bias
- Authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
- Abstract summary: Proposed bias mitigation strategies typically overlook the bias presented in the observed labels.
We first present an overview of different types of label bias in the context of supervised learning systems.
We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem.
- Score: 7.506786114760462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An increased awareness concerning risks of algorithmic bias has driven a
surge of efforts around bias mitigation strategies. A vast majority of the
proposed approaches fall under one of two categories: (1) imposing algorithmic
fairness constraints on predictive models, and (2) collecting additional
training samples. Most recently and at the intersection of these two
categories, methods that propose active learning under fairness constraints
have been developed. However, proposed bias mitigation strategies typically
overlook the bias presented in the observed labels. In this work, we study
fairness considerations of active data collection strategies in the presence of
label bias. We first present an overview of different types of label bias in
the context of supervised learning systems. We then empirically show that, when
overlooking label bias, collecting more data can aggravate bias, and imposing
fairness constraints that rely on the observed labels in the data collection
process may not address the problem. Our results illustrate the unintended
consequences of deploying a model that attempts to mitigate a single type of
bias while neglecting others, emphasizing the importance of explicitly
differentiating between the types of bias that fairness-aware algorithms aim to
address, and highlighting the risks of neglecting label bias during data
collection.
Related papers
- Mitigating Label Bias in Machine Learning: Fairness through Confident
Learning [22.031325797588476]
Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias.
In this paper, we demonstrate that it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning.
arXiv Detail & Related papers (2023-12-14T08:55:38Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Causality and Independence Enhancement for Biased Node Classification [56.38828085943763]
We propose a novel Causality and Independence Enhancement (CIE) framework, applicable to various graph neural networks (GNNs)
Our approach estimates causal and spurious features at the node representation level and mitigates the influence of spurious correlations.
Our approach CIE not only significantly enhances the performance of GNNs but outperforms state-of-the-art debiased node classification methods.
arXiv Detail & Related papers (2023-10-14T13:56:24Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Prisoners of Their Own Devices: How Models Induce Data Bias in
Performative Prediction [4.874780144224057]
A biased model can make decisions that disproportionately harm certain groups in society.
Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones.
We propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour.
arXiv Detail & Related papers (2022-06-27T10:56:04Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Learning Debiased Models with Dynamic Gradient Alignment and
Bias-conflicting Sample Mining [39.00256193731365]
Deep neural networks notoriously suffer from dataset biases which are detrimental to model robustness, generalization and fairness.
We propose a two-stage debiasing scheme to combat against the intractable unknown biases.
arXiv Detail & Related papers (2021-11-25T14:50:10Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Bias-Tolerant Fair Classification [20.973916494320246]
label bias and selection bias are two reasons in data that will hinder the fairness of machine-learning outcomes.
We propose a Bias-TolerantFAirRegularizedLoss (B-FARL) which tries to regain the benefits using data affected by label bias and selection bias.
B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required.
arXiv Detail & Related papers (2021-07-07T13:31:38Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.