Deep Learning on a Data Diet: Finding Important Examples Early in
Training
- URL: http://arxiv.org/abs/2107.07075v2
- Date: Tue, 28 Mar 2023 13:51:14 GMT
- Title: Deep Learning on a Data Diet: Finding Important Examples Early in
Training
- Authors: Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite
- Abstract summary: In vision datasets, simple scores can be used to identify important examples very early in training.
We propose two such scores -- the Gradient Normed (GraNd) and the Error L2-Norm (EL2N)
- Score: 35.746302913918484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent success in deep learning has partially been driven by training
increasingly overparametrized networks on ever larger datasets. It is therefore
natural to ask: how much of the data is superfluous, which examples are
important for generalization, and how do we find them? In this work, we make
the striking observation that, in standard vision datasets, simple scores
averaged over several weight initializations can be used to identify important
examples very early in training. We propose two such scores -- the Gradient
Normed (GraNd) and the Error L2-Norm (EL2N) scores -- and demonstrate their
efficacy on a range of architectures and datasets by pruning significant
fractions of training data without sacrificing test accuracy. In fact, using
EL2N scores calculated a few epochs into training, we can prune half of the
CIFAR10 training set while slightly improving test accuracy. Furthermore, for a
given dataset, EL2N scores from one architecture or hyperparameter
configuration generalize to other configurations. Compared to recent work that
prunes data by discarding examples that are rarely forgotten over the course of
training, our scores use only local information early in training. We also use
our scores to detect noisy examples and study training dynamics through the
lens of important examples -- we investigate how the data distribution shapes
the loss surface and identify subspaces of the model's data representation that
are relatively stable over training.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - BERT on a Data Diet: Finding Important Examples by Gradient-Based
Pruning [20.404705741136777]
We introduce GraNd and its estimated version, EL2N, as scoring metrics for finding important examples in a dataset.
We show that by pruning a small portion of the examples with the highest GraNd/EL2N scores, we can not only preserve the test accuracy, but also surpass it.
arXiv Detail & Related papers (2022-11-10T14:37:23Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - On the Pitfalls of Learning with Limited Data: A Facial Expression
Recognition Case Study [0.5249805590164901]
We focus on the problem of Facial Expression Recognition from videos.
We performed an extensive study with four databases at a different complexity and nine deep-learning architectures for video classification.
We found that complex training sets translate better to more stable test sets when trained with transfer learning and synthetically generated data.
arXiv Detail & Related papers (2021-04-02T18:53:41Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.