Decoding Generalization from Memorization in Deep Neural Networks
- URL: http://arxiv.org/abs/2501.14687v1
- Date: Fri, 24 Jan 2025 18:01:27 GMT
- Title: Decoding Generalization from Memorization in Deep Neural Networks
- Authors: Simran Ketha, Venkatakrishnan Ramaswamy,
- Abstract summary: Deep Neural Networks that generalize well have been key to the dramatic success of Deep Learning in recent years.
It has been known that deep networks possess the ability to memorize training data, as evidenced by perfect or high training accuracies on models trained with corrupted data that have class labels shuffled to varying degrees.
Here, we provide evidence for the latter possibility by demonstrating, empirically, that such models possess information in their representations for substantially improved generalization, even in the face of memorization.
- Score: 0.0
- License:
- Abstract: Overparameterized Deep Neural Networks that generalize well have been key to the dramatic success of Deep Learning in recent years. The reasons for their remarkable ability to generalize are not well understood yet. It has also been known that deep networks possess the ability to memorize training data, as evidenced by perfect or high training accuracies on models trained with corrupted data that have class labels shuffled to varying degrees. Concomitantly, such models are known to generalize poorly, i.e. they suffer from poor test accuracies, due to which it is thought that the act of memorizing substantially degrades the ability to generalize. It has, however, been unclear why the poor generalization that accompanies such memorization, comes about. One possibility is that in the process of training with corrupted data, the layers of the network irretrievably reorganize their representations in a manner that makes generalization difficult. The other possibility is that the network retains significant ability to generalize, but the trained network somehow chooses to readout in a manner that is detrimental to generalization. Here, we provide evidence for the latter possibility by demonstrating, empirically, that such models possess information in their representations for substantially improved generalization, even in the face of memorization. Furthermore, such generalization abilities can be easily decoded from the internals of the trained model, and we build a technique to do so from the outputs of specific layers of the network. We demonstrate results on multiple models trained with a number of standard datasets.
Related papers
- To grok or not to grok: Disentangling generalization and memorization on
corrupted algorithmic datasets [5.854190253899593]
We study an interpretable model where generalizing representations are understood analytically, and are easily distinguishable from the memorizing ones.
We show that (i) it is possible for the network to memorize the corrupted labels emphand achieve $100%$ generalization at the same time.
We also show that in the presence of regularization, the training dynamics involves two consecutive stages.
arXiv Detail & Related papers (2023-10-19T18:01:10Z) - On information captured by neural networks: connections with
memorization and generalization [4.082286997378594]
We study information captured by neural networks during training.
We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds.
Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.
arXiv Detail & Related papers (2023-06-28T04:46:59Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - The Curious Case of Benign Memorization [19.74244993871716]
We show that under training protocols that include data augmentation, neural networks learn to memorize entirely random labels in a benign way.
We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers.
arXiv Detail & Related papers (2022-10-25T13:41:31Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - Generalization Through The Lens Of Leave-One-Out Error [22.188535244056016]
We show that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
arXiv Detail & Related papers (2022-03-07T14:56:00Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Exploring Memorization in Adversarial Training [58.38336773082818]
We investigate the memorization effect in adversarial training (AT) for promoting a deeper understanding of capacity, convergence, generalization, and especially robust overfitting.
We propose a new mitigation algorithm motivated by detailed memorization analyses.
arXiv Detail & Related papers (2021-06-03T05:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.