Can Less be More? When Increasing-to-Balancing Label Noise Rates
Considered Beneficial
- URL: http://arxiv.org/abs/2107.05913v1
- Date: Tue, 13 Jul 2021 08:31:57 GMT
- Title: Can Less be More? When Increasing-to-Balancing Label Noise Rates
Considered Beneficial
- Authors: Yang Liu and Jialu Wang
- Abstract summary: We quantify the trade-offs introduced by increasing a certain group of instances' label noise rate.
We present a method to leverage our idea of inserting label noise for the task of learning with noisy labels.
- Score: 7.299247713124782
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we answer the question when inserting label noise (less
informative labels) can instead return us more accurate and fair models. We are
primarily inspired by two observations that 1) increasing a certain class of
instances' label noise to balance the noise rates (increasing-to-balancing)
results in an easier learning problem; 2) Increasing-to-balancing improves
fairness guarantees against label bias. In this paper, we will first quantify
the trade-offs introduced by increasing a certain group of instances' label
noise rate w.r.t. the learning difficulties and performance guarantees. We
analytically demonstrate when such an increase proves to be beneficial, in
terms of either improved generalization errors or the fairness guarantees. Then
we present a method to leverage our idea of inserting label noise for the task
of learning with noisy labels, either without or with a fairness constraint.
The primary technical challenge we face is due to the fact that we would not
know which data instances are suffering from higher noise, and we would not
have the ground truth labels to verify any possible hypothesis. We propose a
detection method that informs us which group of labels might suffer from higher
noise, without using ground truth information. We formally establish the
effectiveness of the proposed solution and demonstrate it with extensive
experiments.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement [37.4838454216137]
Few-Shot Human-in-the-Loop Refinement (FHLR) is a novel solution to address noisy label learning.
We show that FHLR achieves significantly better performance when learning from noisy labels.
Our work not only achieves better generalization in high-stakes health sensing benchmarks but also sheds light on how noise affects commonly-used models.
arXiv Detail & Related papers (2024-01-25T11:43:35Z) - A law of adversarial risk, interpolation, and label noise [6.980076213134384]
In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy under many circumstances.
We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the dependence of label noise and adversarial risk in terms of the data distribution.
arXiv Detail & Related papers (2022-07-08T14:34:43Z) - Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning
with Label Noise [6.303101074386922]
Robust Label Refurbishment (Robust LR) is a new hybrid method that integrates pseudo-labeling and confidence estimation techniques to refurbish noisy labels.
We show that our method successfully alleviates the damage of both label noise and confirmation bias.
For example, Robust LR achieves up to 4.5% absolute top-1 accuracy improvement over the previous best on the real-world noisy dataset WebVision.
arXiv Detail & Related papers (2021-12-06T12:10:17Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels [98.13491369929798]
We propose a framework called Class2Simi, which transforms data points with noisy class labels to data pairs with noisy similarity labels.
Class2Simi is computationally efficient because not only this transformation is on-the-fly in mini-batches, but also it just changes loss on top of model prediction into a pairwise manner.
arXiv Detail & Related papers (2020-06-14T07:55:32Z) - NoiseRank: Unsupervised Label Noise Reduction with Dependence Models [11.08987870095179]
We propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF)
We construct a dependence model to estimate the posterior probability of an instance being incorrectly labeled given the dataset, and rank instances based on their estimated probabilities.
NoiseRank improves state-of-the-art classification on Food101-N (20% noise) and is effective on high noise Clothing-1M (40% noise)
arXiv Detail & Related papers (2020-03-15T01:10:25Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.