The Curious Case of Benign Memorization
- URL: http://arxiv.org/abs/2210.14019v1
- Date: Tue, 25 Oct 2022 13:41:31 GMT
- Title: The Curious Case of Benign Memorization
- Authors: Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann
- Abstract summary: We show that under training protocols that include data augmentation, neural networks learn to memorize entirely random labels in a benign way.
We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers.
- Score: 19.74244993871716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the empirical advances of deep learning across a variety of learning
tasks, our theoretical understanding of its success is still very restricted.
One of the key challenges is the overparametrized nature of modern models,
enabling complete overfitting of the data even if the labels are randomized,
i.e. networks can completely memorize all given patterns. While such a
memorization capacity seems worrisome, in this work we show that under training
protocols that include data augmentation, neural networks learn to memorize
entirely random labels in a benign way, i.e. they learn embeddings that lead to
highly non-trivial performance under nearest neighbour probing. We demonstrate
that deep models have the surprising ability to separate noise from signal by
distributing the task of memorization and feature learning to different layers.
As a result, only the very last layers are used for memorization, while
preceding layers encode performant features which remain largely unaffected by
the label noise. We explore the intricate role of the augmentations used for
training and identify a memorization-generalization trade-off in terms of their
diversity, marking a clear distinction to all previous works. Finally, we give
a first explanation for the emergence of benign memorization by showing that
malign memorization under data augmentation is infeasible due to the
insufficient capacity of the model for the increased sample size. As a
consequence, the network is forced to leverage the correlated nature of the
augmentations and as a result learns meaningful features. To complete the
picture, a better theory of feature learning in deep neural networks is
required to fully understand the origins of this phenomenon.
Related papers
- Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Frontier AI Models [7.50189359952191]
We show that sequences which are not memorized after the first encounter can be "uncovered" throughout the course of training.
The presence of latent memorization presents a challenge for data privacy as memorized sequences may be hidden at the final checkpoint of the model.
We develop a diagnostic test relying on the cross entropy loss to uncover latent memorized sequences with high accuracy.
arXiv Detail & Related papers (2024-06-20T17:56:17Z) - Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - MILD: Modeling the Instance Learning Dynamics for Learning with Noisy
Labels [19.650299232829546]
We propose an iterative selection approach based on the Weibull mixture model to identify clean data.
In particular, we measure the difficulty of memorization and memorize for each instance via the transition times between being misclassified and being memorized.
Our strategy outperforms existing noisy-label learning methods.
arXiv Detail & Related papers (2023-06-20T14:26:53Z) - Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - Continual Learning by Modeling Intra-Class Variation [33.30614232534283]
It has been observed that neural networks perform poorly when the data or tasks are presented sequentially.
Unlike humans, neural networks suffer greatly from catastrophic forgetting, making it impossible to perform life-long learning.
We examine memory-based continual learning and identify that large variation in the representation space is crucial for avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-10-11T12:17:43Z) - Exploring Memorization in Adversarial Training [58.38336773082818]
We investigate the memorization effect in adversarial training (AT) for promoting a deeper understanding of capacity, convergence, generalization, and especially robust overfitting.
We propose a new mitigation algorithm motivated by detailed memorization analyses.
arXiv Detail & Related papers (2021-06-03T05:39:57Z) - Toward Understanding the Feature Learning Process of Self-supervised
Contrastive Learning [43.504548777955854]
We study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process.
We prove that contrastive learning using textbfReLU networks provably learns the desired sparse features if proper augmentations are adopted.
arXiv Detail & Related papers (2021-05-31T16:42:09Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Encoding-based Memory Modules for Recurrent Neural Networks [79.42778415729475]
We study the memorization subtask from the point of view of the design and training of recurrent neural networks.
We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences.
arXiv Detail & Related papers (2020-01-31T11:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.