Variational Bayesian Inference for Crowdsourcing Predictions
- URL: http://arxiv.org/abs/2006.00778v2
- Date: Tue, 2 Jun 2020 02:53:30 GMT
- Title: Variational Bayesian Inference for Crowdsourcing Predictions
- Authors: Desmond Cai, Duc Thien Nguyen, Shiau Hong Lim, Laura Wynter
- Abstract summary: We develop a variational Bayesian technique for two different worker noise models.
Our evaluations on synthetic and real-world datasets demonstrate that these approaches perform significantly better than existing non-Bayesian approaches.
- Score: 6.878219199575748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowdsourcing has emerged as an effective means for performing a number of
machine learning tasks such as annotation and labelling of images and other
data sets. In most early settings of crowdsourcing, the task involved
classification, that is assigning one of a discrete set of labels to each task.
Recently, however, more complex tasks have been attempted including asking
crowdsource workers to assign continuous labels, or predictions. In essence,
this involves the use of crowdsourcing for function estimation. We are
motivated by this problem to drive applications such as collaborative
prediction, that is, harnessing the wisdom of the crowd to predict quantities
more accurately. To do so, we propose a Bayesian approach aimed specifically at
alleviating overfitting, a typical impediment to accurate prediction models in
practice. In particular, we develop a variational Bayesian technique for two
different worker noise models - one that assumes workers' noises are
independent and the other that assumes workers' noises have a latent low-rank
structure. Our evaluations on synthetic and real-world datasets demonstrate
that these Bayesian approaches perform significantly better than existing
non-Bayesian approaches and are thus potentially useful for this class of
crowdsourcing problems.
Related papers
- Adaptive Crowdsourcing Via Self-Supervised Learning [20.393114559367202]
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate.
We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme.
arXiv Detail & Related papers (2024-01-24T05:57:36Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - A Worker-Task Specialization Model for Crowdsourcing: Efficient
Inference and Fundamental Limits [20.955889997204693]
Crowdsourcing system has emerged as an effective platform for labeling data with relatively low cost by using non-expert workers.
In this paper, we consider a new model, called $d$-type specialization model, in which each task and worker has its own (unknown) type.
We propose label inference algorithms achieving the order-wise optimal limit even when the types of tasks or those of workers are unknown.
arXiv Detail & Related papers (2021-11-19T05:32:59Z) - Crowdsourcing with Meta-Workers: A New Way to Save the Budget [50.04836252733443]
We introduce the concept of emphmeta-worker, a machine annotator trained by meta learning for types of tasks that are well-fit for AI.
Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free.
arXiv Detail & Related papers (2021-11-07T12:40:29Z) - Robust Deep Learning from Crowds with Belief Propagation [6.643082745560235]
A graphical model representing local dependencies between workers and tasks provides a principled way of reasoning over the true labels from the noisy answers.
One needs a predictive model working on unseen data directly from crowdsourced datasets instead of the true labels in many cases.
We propose a new data-generating process, where a neural network generates the true labels from task features.
arXiv Detail & Related papers (2021-11-01T07:20:16Z) - Shifts: A Dataset of Real Distributional Shift Across Multiple
Large-Scale Tasks [44.61070965407907]
Given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary.
We propose the emphShifts dataset for evaluation of uncertainty estimates and robustness to distributional shift.
arXiv Detail & Related papers (2021-07-15T16:59:34Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - Leveraging Clickstream Trajectories to Reveal Low-Quality Workers in
Crowdsourced Forecasting Platforms [22.995941896769843]
We propose the use of a computational framework to identify clusters of underperforming workers using clickstream trajectories.
The framework can reveal different types of underperformers, such as workers with forecasts whose accuracy is far from the consensus of the crowd.
Our study suggests that clickstream clustering and analysis are fundamental tools to diagnose the performance of crowdworkers in platforms leveraging the wisdom of crowds.
arXiv Detail & Related papers (2020-09-04T00:26:38Z) - Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks [50.78037828213118]
This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning.
We propose a novel semi-supervised crowd counting method which is built upon two innovative components.
arXiv Detail & Related papers (2020-07-07T05:30:53Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.