What Neural Networks Memorize and Why: Discovering the Long Tail via
Influence Estimation
- URL: http://arxiv.org/abs/2008.03703v1
- Date: Sun, 9 Aug 2020 10:12:28 GMT
- Title: What Neural Networks Memorize and Why: Discovering the Long Tail via
Influence Estimation
- Authors: Vitaly Feldman and Chiyuan Zhang
- Abstract summary: Deep learning algorithms are well-known to have a propensity for fitting the training data very well.
Such fitting requires memorization of training data labels.
We propose a theoretical explanation for this phenomenon based on a combination of two insights.
- Score: 37.5845376458136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning algorithms are well-known to have a propensity for fitting the
training data very well and often fit even outliers and mislabeled data points.
Such fitting requires memorization of training data labels, a phenomenon that
has attracted significant research interest but has not been given a compelling
explanation so far. A recent work of Feldman (2019) proposes a theoretical
explanation for this phenomenon based on a combination of two insights. First,
natural image and data distributions are (informally) known to be long-tailed,
that is have a significant fraction of rare and atypical examples. Second, in a
simple theoretical model such memorization is necessary for achieving
close-to-optimal generalization error when the data distribution is
long-tailed. However, no direct empirical evidence for this explanation or even
an approach for obtaining such evidence were given.
In this work we design experiments to test the key ideas in this theory. The
experiments require estimation of the influence of each training example on the
accuracy at each test example as well as memorization values of training
examples. Estimating these quantities directly is computationally prohibitive
but we show that closely-related subsampled influence and memorization values
can be estimated much more efficiently. Our experiments demonstrate the
significant benefits of memorization for generalization on several standard
benchmarks. They also provide quantitative and visually compelling evidence for
the theory put forth in (Feldman, 2019).
Related papers
- Stochastic Amortization: A Unified Approach to Accelerate Feature and
Data Attribution [67.28273187033693]
We show that training a network that directly predicts the desired output, known as amortization, is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.
We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z) - Empirical Design in Reinforcement Learning [28.06268918627829]
It is now common to benchmark agents with millions of parameters against dozens of tasks, each using the equivalent of 30 days of experience.
The scale of these experiments often conflict with the need for proper statistical evidence, especially when comparing algorithms.
This manuscript represents both a call to action, and a comprehensive resource for how to do good experiments in reinforcement learning.
arXiv Detail & Related papers (2023-04-03T19:32:24Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - An Empirical Study of Memorization in NLP [8.293936347234126]
We use three different NLP tasks to check if the long-tail theory holds.
Experiments demonstrate that top-ranked memorized training instances are likely atypical.
We develop an attribution method to better understand why a training instance is memorized.
arXiv Detail & Related papers (2022-03-23T03:27:56Z) - Impact of Pretraining Term Frequencies on Few-Shot Reasoning [51.990349528930125]
We investigate how well pretrained language models reason with terms that are less frequent in the pretraining data.
We measure the strength of this correlation for a number of GPT-based language models on various numerical deduction tasks.
Although LMs exhibit strong performance at few-shot numerical reasoning tasks, our results raise the question of how much models actually generalize beyond pretraining data.
arXiv Detail & Related papers (2022-02-15T05:43:54Z) - Understanding Memorization from the Perspective of Optimization via
Efficient Influence Estimation [54.899751055620904]
We study the phenomenon of memorization with turn-over dropout, an efficient method to estimate influence and memorization, for data with true labels (real data) and data with random labels (random data)
Our main findings are: (i) For both real data and random data, the optimization of easy examples (e.g., real data) and difficult examples (e.g., random data) are conducted by the network simultaneously, with easy ones at a higher speed; (ii) For real data, a correct difficult example in the training dataset is more informative than an easy one.
arXiv Detail & Related papers (2021-12-16T11:34:23Z) - Deep Learning Through the Lens of Example Difficulty [21.522182447513632]
We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth.
Our investigation reveals surprising yet simple relationships between the prediction depth of a given input and the model's uncertainty, confidence, accuracy and speed of learning for that data point.
arXiv Detail & Related papers (2021-06-17T16:48:12Z) - A Theoretical Analysis of Learning with Noisily Labeled Data [62.946840431501855]
We first show that in the first epoch training, the examples with clean labels will be learned first.
We then show that after the learning from clean data stage, continuously training model can achieve further improvement in testing error.
arXiv Detail & Related papers (2021-04-08T23:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.