Binary Classification with Instance and Label Dependent Label Noise
- URL: http://arxiv.org/abs/2306.03402v1
- Date: Tue, 6 Jun 2023 04:47:44 GMT
- Title: Binary Classification with Instance and Label Dependent Label Noise
- Authors: Hyungki Im and Paul Grigas
- Abstract summary: We show that learning with noisy samples is impossible without access to clean samples or strong assumptions on the distribution of the data.
Our findings suggest that learning solely with noisy samples is impossible without access to clean samples or strong assumptions on the distribution of the data.
- Score: 4.061135251278187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning with label dependent label noise has been extensively explored in
both theory and practice; however, dealing with instance (i.e., feature) and
label dependent label noise continues to be a challenging task. The difficulty
arises from the fact that the noise rate varies for each instance, making it
challenging to estimate accurately. The question of whether it is possible to
learn a reliable model using only noisy samples remains unresolved. We answer
this question with a theoretical analysis that provides matching upper and
lower bounds. Surprisingly, our results show that, without any additional
assumptions, empirical risk minimization achieves the optimal excess risk
bound. Specifically, we derive a novel excess risk bound proportional to the
noise level, which holds in very general settings, by comparing the empirical
risk minimizers obtained from clean samples and noisy samples. Second, we show
that the minimax lower bound for the 0-1 loss is a constant proportional to the
average noise rate. Our findings suggest that learning solely with noisy
samples is impossible without access to clean samples or strong assumptions on
the distribution of the data.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels.
We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z) - LongReMix: Robust Learning with High Confidence Samples in a Noisy Label
Environment [33.376639002442914]
We propose the new 2-stage noisy-label training algorithm LongReMix.
We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N.
Our approach achieves state-of-the-art performance in most datasets.
arXiv Detail & Related papers (2021-03-06T18:48:40Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Learning with Instance-Dependent Label Noise: A Sample Sieve Approach [24.143469284851456]
Human-annotated labels are often prone to noise.
The presence of such noise will degrade the performance of the resulting deep neural network (DNN) models.
We propose CORES$2$, which progressively sieves out corrupted examples.
arXiv Detail & Related papers (2020-10-05T21:44:09Z) - Confidence Scores Make Instance-dependent Label-noise Learning Possible [129.84497190791103]
In learning with noisy labels, for every instance, its label can randomly walk to other classes following a transition distribution which is named a noise model.
We introduce confidence-scored instance-dependent noise (CSIDN), where each instance-label pair is equipped with a confidence score.
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
arXiv Detail & Related papers (2020-01-11T16:15:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.