Coresets for Robust Training of Neural Networks against Noisy Labels
- URL: http://arxiv.org/abs/2011.07451v1
- Date: Sun, 15 Nov 2020 04:58:11 GMT
- Title: Coresets for Robust Training of Neural Networks against Noisy Labels
- Authors: Baharan Mirzasoleiman, Kaidi Cao, Jure Leskovec
- Abstract summary: We propose a novel approach with strong theoretical guarantees for robust training of deep networks trained with noisy labels.
We select weighted subsets (coresets) of clean data points that provide an approximately low-rank Jacobian matrix.
Our experiments corroborate our theory and demonstrate that deep networks trained on our subsets achieve a significantly superior performance compared to state-of-the art.
- Score: 78.03027938765746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern neural networks have the capacity to overfit noisy labels frequently
found in real-world datasets. Although great progress has been made, existing
techniques are limited in providing theoretical guarantees for the performance
of the neural networks trained with noisy labels. Here we propose a novel
approach with strong theoretical guarantees for robust training of deep
networks trained with noisy labels. The key idea behind our method is to select
weighted subsets (coresets) of clean data points that provide an approximately
low-rank Jacobian matrix. We then prove that gradient descent applied to the
subsets do not overfit the noisy labels. Our extensive experiments corroborate
our theory and demonstrate that deep networks trained on our subsets achieve a
significantly superior performance compared to state-of-the art, e.g., 6%
increase in accuracy on CIFAR-10 with 80% noisy labels, and 7% increase in
accuracy on mini Webvision.
Related papers
- Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching.
It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training.
We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z) - Revisiting Class Imbalance for End-to-end Semi-Supervised Object
Detection [1.6249267147413524]
Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods.
Many methods face challenges due to class imbalance, which hinders the effectiveness of the pseudo-label generator.
In this paper, we examine the root causes of low-quality pseudo-labels and present novel learning mechanisms to improve the label generation quality.
arXiv Detail & Related papers (2023-06-04T06:01:53Z) - Investigating Why Contrastive Learning Benefits Robustness Against Label
Noise [6.855361451300868]
Self-supervised contrastive learning has been shown to be very effective in preventing deep networks from overfitting noisy labels.
We rigorously prove that the representation matrix learned by contrastive learning boosts robustness.
arXiv Detail & Related papers (2022-01-29T05:19:26Z) - Unsupervised Representation Learning via Neural Activation Coding [66.65837512531729]
We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications.
We show that NAC learns both continuous and discrete representations of data, which we respectively evaluate on two downstream tasks.
arXiv Detail & Related papers (2021-12-07T21:59:45Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Co-Seg: An Image Segmentation Framework Against Label Corruption [8.219887855003648]
Supervised deep learning performance is heavily tied to the availability of high-quality labels for training.
We propose a novel framework, namely Co-Seg, to collaboratively train segmentation networks on datasets which include low-quality noisy labels.
Our framework can be easily implemented in any segmentation algorithm to increase its robustness to noisy labels.
arXiv Detail & Related papers (2021-01-31T20:01:40Z) - Noisy Labels Can Induce Good Representations [53.47668632785373]
We study how architecture affects learning with noisy labels.
We show that training with noisy labels can induce useful hidden representations, even when the model generalizes poorly.
This finding leads to a simple method to improve models trained on noisy labels.
arXiv Detail & Related papers (2020-12-23T18:58:05Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Iterative Label Improvement: Robust Training by Confidence Based
Filtering and Dataset Partitioning [5.1293809610257775]
State-of-the-art, high capacity deep neural networks require large amounts of labelled training data.
They are also highly susceptible to label errors in this data.
We propose a novel meta training and labelling scheme that is able to use inexpensive unlabelled data.
arXiv Detail & Related papers (2020-02-07T10:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.