Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
- URL: http://arxiv.org/abs/2010.02347v2
- Date: Mon, 22 Mar 2021 22:01:05 GMT
- Title: Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
- Authors: Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, Yang Liu
- Abstract summary: Human-annotated labels are often prone to noise.
The presence of such noise will degrade the performance of the resulting deep neural network (DNN) models.
We propose CORES$2$, which progressively sieves out corrupted examples.
- Score: 24.143469284851456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-annotated labels are often prone to noise, and the presence of such
noise will degrade the performance of the resulting deep neural network (DNN)
models. Much of the literature (with several recent exceptions) of learning
with noisy labels focuses on the case when the label noise is independent of
features. Practically, annotations errors tend to be instance-dependent and
often depend on the difficulty levels of recognizing a certain task. Applying
existing results from instance-independent settings would require a significant
amount of estimation of noise rates. Therefore, providing theoretically
rigorous solutions for learning with instance-dependent label noise remains a
challenge. In this paper, we propose CORES$^{2}$ (COnfidence REgularized Sample
Sieve), which progressively sieves out corrupted examples. The implementation
of CORES$^{2}$ does not require specifying noise rates and yet we are able to
provide theoretical guarantees of CORES$^{2}$ in filtering out the corrupted
examples. This high-quality sample sieve allows us to treat clean examples and
the corrupted ones separately in training a DNN solution, and such a separation
is shown to be advantageous in the instance-dependent noise setting. We
demonstrate the performance of CORES$^{2}$ on CIFAR10 and CIFAR100 datasets
with synthetic instance-dependent label noise and Clothing1M with real-world
human noise. As of independent interests, our sample sieve provides a generic
machinery for anatomizing noisy datasets and provides a flexible interface for
various robust training techniques to further improve the performance. Code is
available at https://github.com/UCSC-REAL/cores.
Related papers
- Binary Classification with Instance and Label Dependent Label Noise [4.061135251278187]
We show that learning with noisy samples is impossible without access to clean samples or strong assumptions on the distribution of the data.
Our findings suggest that learning solely with noisy samples is impossible without access to clean samples or strong assumptions on the distribution of the data.
arXiv Detail & Related papers (2023-06-06T04:47:44Z) - Learning from Noisy Labels with Coarse-to-Fine Sample Credibility
Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging.
Previous efforts tend to handle part or full data in a unified denoising flow.
We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels.
We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Confidence Scores Make Instance-dependent Label-noise Learning Possible [129.84497190791103]
In learning with noisy labels, for every instance, its label can randomly walk to other classes following a transition distribution which is named a noise model.
We introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is equipped with a confidence score.
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
arXiv Detail & Related papers (2020-01-11T16:15:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.