Mitigating Observation Biases in Crowdsourced Label Aggregation
- URL: http://arxiv.org/abs/2302.13100v1
- Date: Sat, 25 Feb 2023 15:19:13 GMT
- Title: Mitigating Observation Biases in Crowdsourced Label Aggregation
- Authors: Ryosuke Ueda, Koh Takeuchi, Hisashi Kashima
- Abstract summary: One of the technical challenges in obtaining high-quality results from crowdsourcing is dealing with the variability and bias caused by the fact that it is humans execute the work.
In this study, we focus on the observation bias in crowdsourcing.
Variations in the frequency of worker responses and the complexity of tasks occur, which may affect the aggregation results.
- Score: 19.460509608096217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowdsourcing has been widely used to efficiently obtain labeled datasets for
supervised learning from large numbers of human resources at low cost. However,
one of the technical challenges in obtaining high-quality results from
crowdsourcing is dealing with the variability and bias caused by the fact that
it is humans execute the work, and various studies have addressed this issue to
improve the quality by integrating redundantly collected responses. In this
study, we focus on the observation bias in crowdsourcing. Variations in the
frequency of worker responses and the complexity of tasks occur, which may
affect the aggregation results when they are correlated with the quality of the
responses. We also propose statistical aggregation methods for crowdsourcing
responses that are combined with an observational data bias removal method used
in causal inference. Through experiments using both synthetic and real datasets
with/without artificially injected spam and colluding workers, we verify that
the proposed method improves the aggregation accuracy in the presence of strong
observation biases and robustness to both spam and colluding workers.
Related papers
- Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - Data Quality in Crowdsourcing and Spamming Behavior Detection [2.6481162211614118]
We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition.
A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility.
arXiv Detail & Related papers (2024-04-04T02:21:38Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Fairness Improves Learning from Noisily Labeled Long-Tailed Data [119.0612617460727]
Long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning.
We introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations.
We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance.
arXiv Detail & Related papers (2023-03-22T03:46:51Z) - FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning [15.101940747707701]
FedRN exploits k-reliable neighbors with high data expertise or similarity.
Compared with existing robust training methods, the results show that FedRN significantly improves the test accuracy in the presence of noisy labels.
arXiv Detail & Related papers (2022-05-03T05:09:08Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - Relabel the Noise: Joint Extraction of Entities and Relations via
Cooperative Multiagents [52.55119217982361]
We propose a joint extraction approach to handle noisy instances with a group of cooperative multiagents.
To handle noisy instances in a fine-grained manner, each agent in the cooperative group evaluates the instance by calculating a continuous confidence score from its own perspective.
A confidence consensus module is designed to gather the wisdom of all agents and re-distribute the noisy training set with confidence-scored labels.
arXiv Detail & Related papers (2020-04-21T12:03:04Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.