Robust Training under Label Noise by Over-parameterization
- URL: http://arxiv.org/abs/2202.14026v1
- Date: Mon, 28 Feb 2022 18:50:10 GMT
- Title: Robust Training under Label Noise by Over-parameterization
- Authors: Sheng Liu and Zhihui Zhu and Qing Qu and Chong You
- Abstract summary: We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data.
Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
- Score: 41.03008228953627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, over-parameterized deep networks, with increasingly more network
parameters than training samples, have dominated the performances of modern
machine learning. However, when the training data is corrupted, it has been
well-known that over-parameterized networks tend to overfit and do not
generalize. In this work, we propose a principled approach for robust training
of over-parameterized deep networks in classification tasks where a proportion
of training labels are corrupted. The main idea is yet very simple: label noise
is sparse and incoherent with the network learned from clean data, so we model
the noise and learn to separate it from the data. Specifically, we model the
label noise via another sparse over-parameterization term, and exploit implicit
algorithmic regularizations to recover and separate the underlying corruptions.
Remarkably, when trained using such a simple method in practice, we demonstrate
state-of-the-art test accuracy against label noise on a variety of real
datasets. Furthermore, our experimental results are corroborated by theory on
simplified linear models, showing that exact separation between sparse noise
and low-rank data can be achieved under incoherent conditions. The work opens
many interesting directions for improving over-parameterized models by using
sparse over-parameterization and implicit regularization.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Adversarial Noises Are Linearly Separable for (Nearly) Random Neural
Networks [46.13404040937189]
Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.
In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step methods are linearly separable if equipped with the corresponding labels.
arXiv Detail & Related papers (2022-06-09T07:26:46Z) - Scalable Penalized Regression for Noise Detection in Learning with Noisy
Labels [44.79124350922491]
We propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL)
Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels.
To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces.
arXiv Detail & Related papers (2022-03-15T11:09:58Z) - Synergistic Network Learning and Label Correction for Noise-robust Image
Classification [28.27739181560233]
Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice.
We propose a robust label correction framework combining the ideas of small loss selection and noise correction.
We demonstrate our method on both synthetic and real-world datasets with different noise types and rates.
arXiv Detail & Related papers (2022-02-27T23:06:31Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.