OpinionRank: Extracting Ground Truth Labels from Unreliable Expert
Opinions with Graph-Based Spectral Ranking
- URL: http://arxiv.org/abs/2102.05884v1
- Date: Thu, 11 Feb 2021 08:12:44 GMT
- Title: OpinionRank: Extracting Ground Truth Labels from Unreliable Expert
Opinions with Graph-Based Spectral Ranking
- Authors: Glenn Dawson and Robi Polikar
- Abstract summary: crowdsourcing has emerged as a popular, inexpensive, and efficient data mining solution for performing distributed label collection.
We propose OpinionRank, a model-free, interpretable, graph-based spectral algorithm for integrating crowdsourced annotations into reliable labels.
Our experiments show that OpinionRank performs favorably when compared against more highly parameterized algorithms.
- Score: 2.1930130356902207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As larger and more comprehensive datasets become standard in contemporary
machine learning, it becomes increasingly more difficult to obtain reliable,
trustworthy label information with which to train sophisticated models. To
address this problem, crowdsourcing has emerged as a popular, inexpensive, and
efficient data mining solution for performing distributed label collection.
However, crowdsourced annotations are inherently untrustworthy, as the labels
are provided by anonymous volunteers who may have varying, unreliable
expertise. Worse yet, some participants on commonly used platforms such as
Amazon Mechanical Turk may be adversarial, and provide intentionally incorrect
label information without the end user's knowledge. We discuss three
conventional models of the label generation process, describing their
parameterizations and the model-based approaches used to solve them. We then
propose OpinionRank, a model-free, interpretable, graph-based spectral
algorithm for integrating crowdsourced annotations into reliable labels for
performing supervised or semi-supervised learning. Our experiments show that
OpinionRank performs favorably when compared against more highly parameterized
algorithms. We also show that OpinionRank is scalable to very large datasets
and numbers of label sources, and requires considerably less computational
resources than previous approaches.
Related papers
- Coupled Confusion Correction: Learning from Crowds with Sparse
Annotations [43.94012824749425]
Confusion matrices learned by two models can be corrected by the distilled data from the other.
We cluster the annotator groups'' who share similar expertise so that their confusion matrices could be corrected together.
arXiv Detail & Related papers (2023-12-12T14:47:26Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Going Beyond One-Hot Encoding in Classification: Can Human Uncertainty
Improve Model Performance? [14.610038284393166]
We show that label uncertainty is explicitly embedded into the training process via distributional labels.
The incorporation of label uncertainty helps the model to generalize better to unseen data and increases model performance.
Similar to existing calibration methods, the distributional labels lead to better-calibrated probabilities, which in turn yield more certain and trustworthy predictions.
arXiv Detail & Related papers (2022-05-30T17:19:11Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Confident in the Crowd: Bayesian Inference to Improve Data Labelling in
Crowdsourcing [0.30458514384586394]
We present new techniques to improve the quality of the labels while attempting to reduce the cost.
This paper investigates the use of more sophisticated methods, such as Bayesian inference, to measure the performance of the labellers.
Our methods outperform the standard voting methods in both cost and accuracy while maintaining higher reliability when there is disagreement within the crowd.
arXiv Detail & Related papers (2021-05-28T17:09:45Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific
Perturbations for Tabular Data [8.276156981100364]
Co-teaching methods have shown promising improvements for computer vision problems with noisy labels.
Our model, CrowdTeacher, uses the idea that robustness in the input space model can improve the perturbation of the classifier for noisy labels.
We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets.
arXiv Detail & Related papers (2021-03-31T15:09:38Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.