Unsupervised Crowdsourcing with Accuracy and Cost Guarantees
- URL: http://arxiv.org/abs/2207.01988v1
- Date: Tue, 5 Jul 2022 12:14:11 GMT
- Title: Unsupervised Crowdsourcing with Accuracy and Cost Guarantees
- Authors: Yashvardhan Didwania, Jayakrishnan Nair, N. Hemachandra
- Abstract summary: We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items.
We propose algorithms for acquiring label predictions from workers, and for inferring the true labels of items.
- Score: 4.008789789191313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of cost-optimal utilization of a crowdsourcing
platform for binary, unsupervised classification of a collection of items,
given a prescribed error threshold. Workers on the crowdsourcing platform are
assumed to be divided into multiple classes, based on their skill, experience,
and/or past performance. We model each worker class via an unknown confusion
matrix, and a (known) price to be paid per label prediction. For this setting,
we propose algorithms for acquiring label predictions from workers, and for
inferring the true labels of items. We prove that if the number of (unlabeled)
items available is large enough, our algorithms satisfy the prescribed error
thresholds, incurring a cost that is near-optimal. Finally, we validate our
algorithms, and some heuristics inspired by them, through an extensive case
study.
Related papers
- Making Binary Classification from Multiple Unlabeled Datasets Almost
Free of Supervision [128.6645627461981]
We propose a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors.
In MU-OPPO, we do not need the class priors for all unlabeled datasets.
We show that our framework brings smaller estimation errors of class priors and better performance of binary classification.
arXiv Detail & Related papers (2023-06-12T11:33:46Z) - Enhanced Nearest Neighbor Classification for Crowdsourcing [26.19048869302787]
Crowdsourcing is an economical way to label a large amount of data.
The noise in the produced labels may deteriorate the accuracy of any classification method applied to the labelled data.
We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue.
arXiv Detail & Related papers (2022-02-26T22:53:52Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Learning with Proper Partial Labels [87.65718705642819]
Partial-label learning is a kind of weakly-supervised learning with inexact labels.
We show that this proper partial-label learning framework includes many previous partial-label learning settings.
We then derive a unified unbiased estimator of the classification risk.
arXiv Detail & Related papers (2021-12-23T01:37:03Z) - Active clustering for labeling training data [0.8029049649310211]
We propose a setting for training data gathering where the human experts perform the comparatively cheap task of answering pairwise queries.
We analyze the algorithms that minimize the average number of queries required to cluster the items and analyze their complexity.
arXiv Detail & Related papers (2021-10-27T15:35:58Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Cost-Accuracy Aware Adaptive Labeling for Active Learning [9.761953860259942]
In many real settings, different labelers have different labeling costs and can yield different labeling accuracies.
We propose a new algorithm for selecting instances, labelers and their corresponding costs and labeling accuracies.
Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.
arXiv Detail & Related papers (2021-05-24T17:21:00Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.