Related papers: To Aggregate or Not? Learning with Separate Noisy Labels

To Aggregate or Not? Learning with Separate Noisy Labels

URL: http://arxiv.org/abs/2206.07181v1
Date: Tue, 14 Jun 2022 21:32:26 GMT
Title: To Aggregate or Not? Learning with Separate Noisy Labels
Authors: Jiaheng Wei, Zhaowei Zhu, Tianyi Luo, Ehsan Amid, Abhishek Kumar, Yang Liu
Abstract summary: This paper addresses the question of whether one should aggregate separate noisy labels into single ones or use them separately as given. We theoretically analyze the performance of both approaches under the empirical risk minimization framework. Our theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufficient.
Score: 28.14966756980763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rawly collected training data often comes with separate noisy labels collected from multiple imperfect annotators (e.g., via crowdsourcing). Typically one would first aggregate the separate noisy labels into one and apply standard training methods. The literature has also studied extensively on effective aggregation approaches. This paper revisits this choice and aims to provide an answer to the question of whether one should aggregate separate noisy labels into single ones or use them separately as given. We theoretically analyze the performance of both approaches under the empirical risk minimization framework for a number of popular loss functions, including the ones designed specifically for the problem of learning with noisy labels. Our theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufficient. Extensive empirical results validate our conclusion.

Related papers

Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation [66.28528968249255]
Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal area under the ROC curve (AUC) against a single binary target label. How can one synthesize such labels into a single coherent ranking? We analyze two approaches to this problem -- loss aggregation and label aggregation -- by characterizing their Bayes-optimal solutions.
arXiv Detail & Related papers (2025-04-15T15:25:27Z)
Mixed Blessing: Class-Wise Embedding guided Instance-Dependent Partial Label Learning [53.64180787439527]
In partial label learning (PLL), every sample is associated with a candidate label set comprising the ground-truth label and several noisy labels. For the first time, we create class-wise embeddings for each sample, which allow us to explore the relationship of instance-dependent noisy labels. To reduce the high label ambiguity, we introduce the concept of class prototypes containing global feature information.
arXiv Detail & Related papers (2024-12-06T13:25:39Z)
Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching. By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously. Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z)
Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice. We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z)
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation. We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z)
Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Modelling Approach [24.568964285904055]
This paper presents a novel data-driven approach to learning from noisy labels (LNL)<n>We propose a method that automatically generates additional i.i.d. noisy labels through nearest neighbours.<n>Our method demonstrably estimates clean labels accurately across various label noise benchmarks, including synthetic, web-controlled, and real-world datasets.
arXiv Detail & Related papers (2023-01-04T01:54:33Z)
Category-Adaptive Label Discovery and Noise Rejection for Multi-label Image Recognition with Partial Positive Labels [78.88007892742438]
Training multi-label models with partial positive labels (MLR-PPL) attracts increasing attention. Previous works regard unknown labels as negative and adopt traditional MLR algorithms. We propose to explore semantic correlation among different images to facilitate the MLR-PPL task.
arXiv Detail & Related papers (2022-11-15T02:11:20Z)
Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data. Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z)
Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks. We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z)
Rethinking Noisy Label Models: Labeler-Dependent Noise with Adversarial Awareness [2.1930130356902207]
We propose a principled model of label noise that generalizes instance-dependent noise to multiple labelers. Under our labeler-dependent model, label noise manifests itself under two modalities: natural error of good-faith labelers, and adversarial labels provided by malicious actors. We present two adversarial attack vectors that more accurately reflect the label noise that may be encountered in real-world settings.
arXiv Detail & Related papers (2021-05-28T19:58:18Z)
Harmless label noise and informative soft-labels in supervised classification [1.6752182911522517]
Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset. In particular, when classification difficulty is the only source of label errors, multiple sets of noisy labels can supply more information for the estimation of a classification rule.
arXiv Detail & Related papers (2021-04-07T02:56:11Z)
A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks. We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z)
Label Noise Types and Their Effects on Deep Learning [0.0]
In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning. We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning. For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
arXiv Detail & Related papers (2020-03-23T18:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.