To Aggregate or Not? Learning with Separate Noisy Labels
- URL: http://arxiv.org/abs/2206.07181v1
- Date: Tue, 14 Jun 2022 21:32:26 GMT
- Title: To Aggregate or Not? Learning with Separate Noisy Labels
- Authors: Jiaheng Wei, Zhaowei Zhu, Tianyi Luo, Ehsan Amid, Abhishek Kumar, Yang
Liu
- Abstract summary: This paper addresses the question of whether one should aggregate separate noisy labels into single ones or use them separately as given.
We theoretically analyze the performance of both approaches under the empirical risk minimization framework.
Our theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufficient.
- Score: 28.14966756980763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rawly collected training data often comes with separate noisy labels
collected from multiple imperfect annotators (e.g., via crowdsourcing).
Typically one would first aggregate the separate noisy labels into one and
apply standard training methods. The literature has also studied extensively on
effective aggregation approaches. This paper revisits this choice and aims to
provide an answer to the question of whether one should aggregate separate
noisy labels into single ones or use them separately as given. We theoretically
analyze the performance of both approaches under the empirical risk
minimization framework for a number of popular loss functions, including the
ones designed specifically for the problem of learning with noisy labels. Our
theorems conclude that label separation is preferred over label aggregation
when the noise rates are high, or the number of labelers/annotations is
insufficient. Extensive empirical results validate our conclusion.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Bridging the Gap between Model Explanations in Partially Annotated
Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation.
We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z) - Category-Adaptive Label Discovery and Noise Rejection for Multi-label
Image Recognition with Partial Positive Labels [78.88007892742438]
Training multi-label models with partial positive labels (MLR-PPL) attracts increasing attention.
Previous works regard unknown labels as negative and adopt traditional MLR algorithms.
We propose to explore semantic correlation among different images to facilitate the MLR-PPL task.
arXiv Detail & Related papers (2022-11-15T02:11:20Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Rethinking Noisy Label Models: Labeler-Dependent Noise with Adversarial
Awareness [2.1930130356902207]
We propose a principled model of label noise that generalizes instance-dependent noise to multiple labelers.
Under our labeler-dependent model, label noise manifests itself under two modalities: natural error of good-faith labelers, and adversarial labels provided by malicious actors.
We present two adversarial attack vectors that more accurately reflect the label noise that may be encountered in real-world settings.
arXiv Detail & Related papers (2021-05-28T19:58:18Z) - Harmless label noise and informative soft-labels in supervised
classification [1.6752182911522517]
Manual labelling of training examples is common practice in supervised learning.
When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset.
In particular, when classification difficulty is the only source of label errors, multiple sets of noisy labels can supply more information for the estimation of a classification rule.
arXiv Detail & Related papers (2021-04-07T02:56:11Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Label Noise Types and Their Effects on Deep Learning [0.0]
In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning.
We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning.
For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
arXiv Detail & Related papers (2020-03-23T18:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.