Active Learning for Noisy Data Streams Using Weak and Strong Labelers
- URL: http://arxiv.org/abs/2010.14149v1
- Date: Tue, 27 Oct 2020 09:18:35 GMT
- Title: Active Learning for Noisy Data Streams Using Weak and Strong Labelers
- Authors: Taraneh Younesian, Dick Epema, Lydia Y. Chen
- Abstract summary: We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
- Score: 3.9370369973510746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labeling data correctly is an expensive and challenging task in machine
learning, especially for on-line data streams. Deep learning models especially
require a large number of clean labeled data that is very difficult to acquire
in real-world problems. Choosing useful data samples to label while minimizing
the cost of labeling is crucial to maintain efficiency in the training process.
When confronted with multiple labelers with different expertise and respective
labeling costs, deciding which labeler to choose is nontrivial. In this paper,
we consider a novel weak and strong labeler problem inspired by humans natural
ability for labeling, in the presence of data streams with noisy labels and
constrained by a limited budget. We propose an on-line active learning
algorithm that consists of four steps: filtering, adding diversity, informative
sample selection, and labeler selection. We aim to filter out the suspicious
noisy samples and spend the budget on the diverse informative data using strong
and weak labelers in a cost-effective manner. We derive a decision function
that measures the information gain by combining the informativeness of
individual samples and model confidence. We evaluate our proposed algorithm on
the well-known image classification datasets CIFAR10 and CIFAR100 with up to
60% noise. Experiments show that by intelligently deciding which labeler to
query, our algorithm maintains the same accuracy compared to the case of having
only one of the labelers available while spending less of the budget.
Related papers
- You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - Eliciting and Learning with Soft Labels from Every Annotator [31.10635260890126]
We focus on efficiently eliciting soft labels from individual annotators.
We demonstrate that learning with our labels achieves comparable model performance to prior approaches.
arXiv Detail & Related papers (2022-07-02T12:03:00Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Cost-Accuracy Aware Adaptive Labeling for Active Learning [9.761953860259942]
In many real settings, different labelers have different labeling costs and can yield different labeling accuracies.
We propose a new algorithm for selecting instances, labelers and their corresponding costs and labeling accuracies.
Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.
arXiv Detail & Related papers (2021-05-24T17:21:00Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.