Related papers: Variational Bayesian Inference for Crowdsourcing Predictions

Variational Bayesian Inference for Crowdsourcing Predictions

URL: http://arxiv.org/abs/2006.00778v2
Date: Tue, 2 Jun 2020 02:53:30 GMT
Title: Variational Bayesian Inference for Crowdsourcing Predictions
Authors: Desmond Cai, Duc Thien Nguyen, Shiau Hong Lim, Laura Wynter
Abstract summary: We develop a variational Bayesian technique for two different worker noise models. Our evaluations on synthetic and real-world datasets demonstrate that these approaches perform significantly better than existing non-Bayesian approaches.
Score: 6.878219199575748
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Crowdsourcing has emerged as an effective means for performing a number of machine learning tasks such as annotation and labelling of images and other data sets. In most early settings of crowdsourcing, the task involved classification, that is assigning one of a discrete set of labels to each task. Recently, however, more complex tasks have been attempted including asking crowdsource workers to assign continuous labels, or predictions. In essence, this involves the use of crowdsourcing for function estimation. We are motivated by this problem to drive applications such as collaborative prediction, that is, harnessing the wisdom of the crowd to predict quantities more accurately. To do so, we propose a Bayesian approach aimed specifically at alleviating overfitting, a typical impediment to accurate prediction models in practice. In particular, we develop a variational Bayesian technique for two different worker noise models - one that assumes workers' noises are independent and the other that assumes workers' noises have a latent low-rank structure. Our evaluations on synthetic and real-world datasets demonstrate that these Bayesian approaches perform significantly better than existing non-Bayesian approaches and are thus potentially useful for this class of crowdsourcing problems.

Related papers

Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts [4.795811957412855]
Noise in data appears to be inevitable in most real-world machine learning applications. We investigate the less explored area of noisy label learning for multilabel classifications. Our model posits that label noise arises from a shift in the latent variable, providing a more robust and beneficial means for noisy learning.
arXiv Detail & Related papers (2025-02-20T05:41:52Z)
Adaptive Crowdsourcing Via Self-Supervised Learning [20.393114559367202]
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme.
arXiv Detail & Related papers (2024-01-24T05:57:36Z)
Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits [20.955889997204693]
Crowdsourcing system has emerged as an effective platform for labeling data with relatively low cost by using non-expert workers. In this paper, we consider a new model, called $d$-type specialization model, in which each task and worker has its own (unknown) type. We propose label inference algorithms achieving the order-wise optimal limit even when the types of tasks or those of workers are unknown.
arXiv Detail & Related papers (2021-11-19T05:32:59Z)
Crowdsourcing with Meta-Workers: A New Way to Save the Budget [50.04836252733443]
We introduce the concept of emphmeta-worker, a machine annotator trained by meta learning for types of tasks that are well-fit for AI. Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free.
arXiv Detail & Related papers (2021-11-07T12:40:29Z)
Robust Deep Learning from Crowds with Belief Propagation [6.643082745560235]
A graphical model representing local dependencies between workers and tasks provides a principled way of reasoning over the true labels from the noisy answers. One needs a predictive model working on unseen data directly from crowdsourced datasets instead of the true labels in many cases. We propose a new data-generating process, where a neural network generates the true labels from task features.
arXiv Detail & Related papers (2021-11-01T07:20:16Z)
Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks [44.61070965407907]
Given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. We propose the emphShifts dataset for evaluation of uncertainty estimates and robustness to distributional shift.
arXiv Detail & Related papers (2021-07-15T16:59:34Z)
Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks. This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z)
Leveraging Clickstream Trajectories to Reveal Low-Quality Workers in Crowdsourced Forecasting Platforms [22.995941896769843]
We propose the use of a computational framework to identify clusters of underperforming workers using clickstream trajectories. The framework can reveal different types of underperformers, such as workers with forecasts whose accuracy is far from the consensus of the crowd. Our study suggests that clickstream clustering and analysis are fundamental tools to diagnose the performance of crowdworkers in platforms leveraging the wisdom of crowds.
arXiv Detail & Related papers (2020-09-04T00:26:38Z)
Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks [50.78037828213118]
This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. We propose a novel semi-supervised crowd counting method which is built upon two innovative components.
arXiv Detail & Related papers (2020-07-07T05:30:53Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.