Learning with Noisy Labels by Targeted Relabeling
- URL: http://arxiv.org/abs/2110.08355v1
- Date: Fri, 15 Oct 2021 20:37:29 GMT
- Title: Learning with Noisy Labels by Targeted Relabeling
- Authors: Derek Chen, Zhou Yu, and Samuel R. Bowman
- Abstract summary: Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
- Score: 52.0329205268734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Crowdsourcing platforms are often used to collect datasets for training deep
neural networks, despite higher levels of inaccurate labeling compared to
expert labeling. There are two common strategies to manage the impact of this
noise, the first involves aggregating redundant annotations, but comes at the
expense of labeling substantially fewer examples. Secondly, prior works have
also considered using the entire annotation budget to label as many examples as
possible and subsequently apply denoising algorithms to implicitly clean up the
dataset. We propose an approach which instead reserves a fraction of
annotations to explicitly relabel highly probable labeling errors. In
particular, we allocate a large portion of the labeling budget to form an
initial dataset used to train a model. This model is then used to identify
specific examples that appear most likely to be incorrect, which we spend the
remaining budget to relabel. Experiments across three model variations and four
natural language processing tasks show our approach outperforms both label
aggregation and advanced denoising methods designed to handle noisy labels when
allocated the same annotation budget.
Related papers
- Drawing the Same Bounding Box Twice? Coping Noisy Annotations in Object
Detection with Repeated Labels [6.872072177648135]
We propose a novel localization algorithm that adapts well-established ground truth estimation methods.
Our algorithm also shows superior performance during training on the TexBiG dataset.
arXiv Detail & Related papers (2023-09-18T13:08:44Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - SparseDet: Improving Sparsely Annotated Object Detection with
Pseudo-positive Mining [76.95808270536318]
We propose an end-to-end system that learns to separate proposals into labeled and unlabeled regions using Pseudo-positive mining.
While the labeled regions are processed as usual, self-supervised learning is used to process the unlabeled regions.
We conduct exhaustive experiments on five splits on the PASCAL-VOC and COCO datasets achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-01-12T18:57:04Z) - Multi-label Classification with Partial Annotations using Class-aware
Selective Loss [14.3159150577502]
Large-scale multi-label classification datasets are commonly partially annotated.
We analyze the partial labeling problem, then propose a solution based on two key ideas.
With our novel approach, we achieve state-of-the-art results on OpenImages dataset.
arXiv Detail & Related papers (2021-10-21T08:10:55Z) - Learning with Different Amounts of Annotation: From Zero to Many Labels [19.869498599986006]
Training NLP systems typically assume access to annotated data that has a single human label per example.
We explore new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples.
Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task.
arXiv Detail & Related papers (2021-09-09T16:48:41Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Learning from Noisy Labels for Entity-Centric Information Extraction [17.50856935207308]
We propose a simple co-regularization framework for entity-centric information extraction.
These models are jointly optimized with task-specific loss, and are regularized to generate similar predictions.
In the end, we can take any of the trained models for inference.
arXiv Detail & Related papers (2021-04-17T22:49:12Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.