A Worker-Task Specialization Model for Crowdsourcing: Efficient
Inference and Fundamental Limits
- URL: http://arxiv.org/abs/2111.12550v3
- Date: Wed, 13 Sep 2023 05:56:46 GMT
- Title: A Worker-Task Specialization Model for Crowdsourcing: Efficient
Inference and Fundamental Limits
- Authors: Doyeon Kim, Jeonghwan Lee and Hye Won Chung
- Abstract summary: Crowdsourcing system has emerged as an effective platform for labeling data with relatively low cost by using non-expert workers.
In this paper, we consider a new model, called $d$-type specialization model, in which each task and worker has its own (unknown) type.
We propose label inference algorithms achieving the order-wise optimal limit even when the types of tasks or those of workers are unknown.
- Score: 20.955889997204693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowdsourcing system has emerged as an effective platform for labeling data
with relatively low cost by using non-expert workers. Inferring correct labels
from multiple noisy answers on data, however, has been a challenging problem,
since the quality of the answers varies widely across tasks and workers. Many
existing works have assumed that there is a fixed ordering of workers in terms
of their skill levels, and focused on estimating worker skills to aggregate the
answers from workers with different weights. In practice, however, the worker
skill changes widely across tasks, especially when the tasks are heterogeneous.
In this paper, we consider a new model, called $d$-type specialization model,
in which each task and worker has its own (unknown) type and the reliability of
each worker can vary in the type of a given task and that of a worker. We allow
that the number $d$ of types can scale in the number of tasks. In this model,
we characterize the optimal sample complexity to correctly infer the labels
within any given accuracy, and propose label inference algorithms achieving the
order-wise optimal limit even when the types of tasks or those of workers are
unknown. We conduct experiments both on synthetic and real datasets, and show
that our algorithm outperforms the existing algorithms developed based on more
strict model assumptions.
Related papers
- Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types [7.788574428889243]
The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms.
We show that different weights for different types are necessary for a multi-type model.
Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms.
arXiv Detail & Related papers (2023-02-14T23:30:39Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Treating Crowdsourcing as Examination: How to Score Tasks and Online
Workers? [7.403065976821757]
We try to model workers as four types based on their ability: expert, normal worker, sloppy worker and spammer.
We score workers' ability mainly on the medium difficult tasks, then reducing the weight of answers from sloppy workers and modifying the answers from spammers.
arXiv Detail & Related papers (2022-04-26T05:15:58Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Crowdsourcing with Meta-Workers: A New Way to Save the Budget [50.04836252733443]
We introduce the concept of emphmeta-worker, a machine annotator trained by meta learning for types of tasks that are well-fit for AI.
Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free.
arXiv Detail & Related papers (2021-11-07T12:40:29Z) - Robust Deep Learning from Crowds with Belief Propagation [6.643082745560235]
A graphical model representing local dependencies between workers and tasks provides a principled way of reasoning over the true labels from the noisy answers.
One needs a predictive model working on unseen data directly from crowdsourced datasets instead of the true labels in many cases.
We propose a new data-generating process, where a neural network generates the true labels from task features.
arXiv Detail & Related papers (2021-11-01T07:20:16Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Variational Bayesian Inference for Crowdsourcing Predictions [6.878219199575748]
We develop a variational Bayesian technique for two different worker noise models.
Our evaluations on synthetic and real-world datasets demonstrate that these approaches perform significantly better than existing non-Bayesian approaches.
arXiv Detail & Related papers (2020-06-01T08:11:50Z) - Crowdsourced Labeling for Worker-Task Specialization Model [14.315501760755605]
We consider crowdsourced labeling under a $d$-type worker-task specialization model.
We design an inference algorithm that recovers binary task labels by using worker clustering, worker skill estimation and weighted majority voting.
arXiv Detail & Related papers (2020-03-21T13:27:03Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.