End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models
- URL: http://arxiv.org/abs/2011.06833v1
- Date: Fri, 13 Nov 2020 09:48:30 GMT
- Title: End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models
- Authors: Taraneh Younesian, Chi Hong, Amirmasoud Ghiassi, Robert Birke, Lydia
Y. Chen
- Abstract summary: We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
- Score: 6.278267504352446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labeling real-world datasets is time consuming but indispensable for
supervised machine learning models. A common solution is to distribute the
labeling task across a large number of non-expert workers via crowd-sourcing.
Due to the varying background and experience of crowd workers, the obtained
labels are highly prone to errors and even detrimental to the learning models.
In this paper, we advocate using hybrid intelligence, i.e., combining deep
models and human experts, to design an end-to-end learning framework from noisy
crowd-sourced data, especially in an on-line scenario. We first summarize the
state-of-the-art solutions that address the challenges of noisy labels from
non-expert crowd and learn from multiple annotators. We show how label
aggregation can benefit from estimating the annotators' confusion matrices to
improve the learning process. Moreover, with the help of an expert labeler as
well as classifiers, we cleanse aggregated labels of highly informative samples
to enhance the final classification accuracy. We demonstrate the effectiveness
of our strategies on several image datasets, i.e. UCI and CIFAR-10, using SVM
and deep neural networks. Our evaluation shows that our on-line label
aggregation with confusion matrix estimation reduces the error rate of labels
by over 30%. Furthermore, relabeling only 10% of the data using the expert's
results in over 90% classification accuracy with SVM.
Related papers
- ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning
Classification [0.0]
We introduce Crowd-Certain, a novel approach for label aggregation in crowdsourced and ensemble learning classification tasks.
The proposed method uses the consistency of the annotators versus a trained classifier to determine a reliability score for each annotator.
We extensively evaluated our approach against ten existing techniques across ten different datasets, each labeled by varying numbers of annotators.
arXiv Detail & Related papers (2023-10-25T01:58:37Z) - A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden.
We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z) - Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - Learning from Label Proportions by Learning with Label Noise [30.7933303912474]
Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags.
We provide a theoretically grounded approach to LLP based on a reduction to learning with label noise.
Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures.
arXiv Detail & Related papers (2022-03-04T18:52:21Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - OpinionRank: Extracting Ground Truth Labels from Unreliable Expert
Opinions with Graph-Based Spectral Ranking [2.1930130356902207]
crowdsourcing has emerged as a popular, inexpensive, and efficient data mining solution for performing distributed label collection.
We propose OpinionRank, a model-free, interpretable, graph-based spectral algorithm for integrating crowdsourced annotations into reliable labels.
Our experiments show that OpinionRank performs favorably when compared against more highly parameterized algorithms.
arXiv Detail & Related papers (2021-02-11T08:12:44Z) - Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.