Recovering Top-Two Answers and Confusion Probability in Multi-Choice
Crowdsourcing
- URL: http://arxiv.org/abs/2301.00006v2
- Date: Wed, 31 May 2023 08:40:07 GMT
- Title: Recovering Top-Two Answers and Confusion Probability in Multi-Choice
Crowdsourcing
- Authors: Hyeonsu Jeong and Hye Won Chung
- Abstract summary: We consider crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability.
We propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices.
Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability.
- Score: 10.508187462682308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowdsourcing has emerged as an effective platform for labeling large amounts
of data in a cost- and time-efficient manner. Most previous work has focused on
designing an efficient algorithm to recover only the ground-truth labels of the
data. In this paper, we consider multi-choice crowdsourcing tasks with the goal
of recovering not only the ground truth, but also the most confusing answer and
the confusion probability. The most confusing answer provides useful
information about the task by revealing the most plausible answer other than
the ground truth and how plausible it is. To theoretically analyze such
scenarios, we propose a model in which there are the top two plausible answers
for each task, distinguished from the rest of the choices. Task difficulty is
quantified by the probability of confusion between the top two, and worker
reliability is quantified by the probability of giving an answer among the top
two. Under this model, we propose a two-stage inference algorithm to infer both
the top two answers and the confusion probability. We show that our algorithm
achieves the minimax optimal convergence rate. We conduct both synthetic and
real data experiments and demonstrate that our algorithm outperforms other
recent algorithms. We also show the applicability of our algorithms in
inferring the difficulty of tasks and in training neural networks with top-two
soft labels.
Related papers
- The Battleship Approach to the Low Resource Entity Matching Problem [0.0]
We propose a new active learning approach for entity matching problems.
We focus on a selection mechanism that exploits unique properties of entity matching.
An experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching.
arXiv Detail & Related papers (2023-11-27T10:18:17Z) - Optimal and Efficient Binary Questioning for Human-in-the-Loop
Annotation [11.4375764457726]
This paper studies the neglected complementary problem of getting annotated data given a predictor.
For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods.
arXiv Detail & Related papers (2023-07-04T09:11:33Z) - Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks.
We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z) - Efficient Approximate Recovery from Pooled Data Using Doubly Regular
Pooling Schemes [1.7403133838762448]
We analyze an approximate reconstruction algorithm that estimates the hidden bits in a greedy fashion.
Our analysis is uniform in the degree of noise and the sparsity of $sigma$.
arXiv Detail & Related papers (2023-02-28T19:31:40Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Sample Selection for Fair and Robust Training [28.94276265328868]
We propose a sample selection-based algorithm for fair and robust training.
We show that our algorithm obtains fairness and robustness better than or comparable to the state-of-the-art technique.
arXiv Detail & Related papers (2021-10-27T07:17:29Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing [55.012801269326594]
In Byzantine robust distributed learning, a central server wants to train a machine learning model over data distributed across multiple workers.
A fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages.
We propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost.
arXiv Detail & Related papers (2020-06-16T17:58:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.