Fairness and Bias in Truth Discovery Algorithms: An Experimental
Analysis
- URL: http://arxiv.org/abs/2304.12573v1
- Date: Tue, 25 Apr 2023 04:56:35 GMT
- Title: Fairness and Bias in Truth Discovery Algorithms: An Experimental
Analysis
- Authors: Simone Lazier, Saravanan Thirumuruganathan, Hadis Anahideh
- Abstract summary: Crowd workers may sometimes provide unreliable labels.
Truth discovery (TD) algorithms are applied to determine the consensus labels from conflicting worker responses.
We conduct a systematic study of the bias and fairness of TD algorithms.
- Score: 7.575734557466221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) based approaches are increasingly being used in a
number of applications with societal impact. Training ML models often require
vast amounts of labeled data, and crowdsourcing is a dominant paradigm for
obtaining labels from multiple workers. Crowd workers may sometimes provide
unreliable labels, and to address this, truth discovery (TD) algorithms such as
majority voting are applied to determine the consensus labels from conflicting
worker responses. However, it is important to note that these consensus labels
may still be biased based on sensitive attributes such as gender, race, or
political affiliation. Even when sensitive attributes are not involved, the
labels can be biased due to different perspectives of subjective aspects such
as toxicity. In this paper, we conduct a systematic study of the bias and
fairness of TD algorithms. Our findings using two existing crowd-labeled
datasets, reveal that a non-trivial proportion of workers provide biased
results, and using simple approaches for TD is sub-optimal. Our study also
demonstrates that popular TD algorithms are not a panacea. Additionally, we
quantify the impact of these unfair workers on downstream ML tasks and show
that conventional methods for achieving fairness and correcting label biases
are ineffective in this setting. We end the paper with a plea for the design of
novel bias-aware truth discovery algorithms that can ameliorate these issues.
Related papers
- Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors [28.869581543676947]
unsupervised outlier detection (OD) has numerous applications in finance, security, etc.
This work aims to shed light on the possible sources of unfairness in OD by auditing detection models under different data-centric factors.
We find that the OD algorithms under the study all exhibit fairness pitfalls, although differing in which types of data bias they are more susceptible to.
arXiv Detail & Related papers (2024-08-24T20:35:32Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Mitigating Label Bias via Decoupled Confident Learning [14.001915875687862]
Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias.
bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation.
We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias.
arXiv Detail & Related papers (2023-07-18T03:28:03Z) - Diversity matters: Robustness of bias measurements in Wikidata [4.950095974653716]
We reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents.
We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender.
We show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender.
arXiv Detail & Related papers (2023-02-27T18:38:10Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - More Data Can Lead Us Astray: Active Data Acquisition in the Presence of
Label Bias [7.506786114760462]
Proposed bias mitigation strategies typically overlook the bias presented in the observed labels.
We first present an overview of different types of label bias in the context of supervised learning systems.
We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem.
arXiv Detail & Related papers (2022-07-15T19:30:50Z) - Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision
Making [14.905698014932488]
We propose a novel method based on a variational autoencoder for practical fair decision-making.
Our method learns an unbiased data representation leveraging both labeled and unlabeled data.
Our method converges to the optimal (fair) policy according to the ground-truth with low variance.
arXiv Detail & Related papers (2022-05-10T10:33:11Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.