Related papers: Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis

Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis

URL: http://arxiv.org/abs/2304.12573v1
Date: Tue, 25 Apr 2023 04:56:35 GMT
Title: Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis
Authors: Simone Lazier, Saravanan Thirumuruganathan, Hadis Anahideh
Abstract summary: Crowd workers may sometimes provide unreliable labels. Truth discovery (TD) algorithms are applied to determine the consensus labels from conflicting worker responses. We conduct a systematic study of the bias and fairness of TD algorithms.
Score: 7.575734557466221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) based approaches are increasingly being used in a number of applications with societal impact. Training ML models often require vast amounts of labeled data, and crowdsourcing is a dominant paradigm for obtaining labels from multiple workers. Crowd workers may sometimes provide unreliable labels, and to address this, truth discovery (TD) algorithms such as majority voting are applied to determine the consensus labels from conflicting worker responses. However, it is important to note that these consensus labels may still be biased based on sensitive attributes such as gender, race, or political affiliation. Even when sensitive attributes are not involved, the labels can be biased due to different perspectives of subjective aspects such as toxicity. In this paper, we conduct a systematic study of the bias and fairness of TD algorithms. Our findings using two existing crowd-labeled datasets, reveal that a non-trivial proportion of workers provide biased results, and using simple approaches for TD is sub-optimal. Our study also demonstrates that popular TD algorithms are not a panacea. Additionally, we quantify the impact of these unfair workers on downstream ML tasks and show that conventional methods for achieving fairness and correcting label biases are ineffective in this setting. We end the paper with a plea for the design of novel bias-aware truth discovery algorithms that can ameliorate these issues.

Related papers

EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition [49.27067541740956]
EMO-Debias is a large-scale comparison of 13 debiasing methods applied to multi-label SER.<n>Our study encompasses techniques from pre-processing, regularization, adversarial learning, biased learners, and distributionally robust optimization.<n>Our analysis quantifies the trade-offs between fairness and accuracy, identifying which approaches consistently reduce gender performance gaps.
arXiv Detail & Related papers (2025-06-05T05:48:31Z)
Label Distribution Learning with Biased Annotations by Learning Multi-Label Representation [120.97262070068224]
Multi-label learning (MLL) has gained attention for its ability to represent real-world data. Label Distribution Learning (LDL) faces challenges in collecting accurate label distributions.
arXiv Detail & Related papers (2025-02-03T09:04:03Z)
Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors [28.869581543676947]
unsupervised outlier detection (OD) has numerous applications in finance, security, etc. This work aims to shed light on the possible sources of unfairness in OD by auditing detection models under different data-centric factors. We find that the OD algorithms under the study all exhibit fairness pitfalls, although differing in which types of data bias they are more susceptible to.
arXiv Detail & Related papers (2024-08-24T20:35:32Z)
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP) We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z)
Mitigating Label Bias via Decoupled Confident Learning [14.001915875687862]
Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias.
arXiv Detail & Related papers (2023-07-18T03:28:03Z)
Diversity matters: Robustness of bias measurements in Wikidata [4.950095974653716]
We reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. We show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender.
arXiv Detail & Related papers (2023-02-27T18:38:10Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias [7.506786114760462]
Proposed bias mitigation strategies typically overlook the bias presented in the observed labels. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem.
arXiv Detail & Related papers (2022-07-15T19:30:50Z)
Don't Throw it Away! The Utility of Unlabeled Data in Fair Decision Making [14.905698014932488]
We propose a novel method based on a variational autoencoder for practical fair decision-making. Our method learns an unbiased data representation leveraging both labeled and unlabeled data. Our method converges to the optimal (fair) policy according to the ground-truth with low variance.
arXiv Detail & Related papers (2022-05-10T10:33:11Z)
Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.