Related papers: Generalizability of Memorization Neural Networks

Generalizability of Memorization Neural Networks

URL: http://arxiv.org/abs/2411.00372v1
Date: Fri, 01 Nov 2024 05:18:46 GMT
Title: Generalizability of Memorization Neural Networks
Authors: Lijia Yu, Xiao-Shan Gao, Lijun Zhang, Yibo Miao,
Abstract summary: memorization is widely believed to have a close relationship with the strong generalizability of deep learning. We show that, in order for the memorization networks to be generalizable, the width of the network must be at least equal to the dimension of the data. It is also shown that there exist data distributions such that, to be generalizable for them, the memorization network must have an exponential number of parameters in the data dimension.
Score: 13.144557876007358
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The neural network memorization problem is to study the expressive power of neural networks to interpolate a finite dataset. Although memorization is widely believed to have a close relationship with the strong generalizability of deep learning when using over-parameterized models, to the best of our knowledge, there exists no theoretical study on the generalizability of memorization neural networks. In this paper, we give the first theoretical analysis of this topic. Since using i.i.d. training data is a necessary condition for a learning algorithm to be generalizable, memorization and its generalization theory for i.i.d. datasets are developed under mild conditions on the data distribution. First, algorithms are given to construct memorization networks for an i.i.d. dataset, which have the smallest number of parameters and even a constant number of parameters. Second, we show that, in order for the memorization networks to be generalizable, the width of the network must be at least equal to the dimension of the data, which implies that the existing memorization networks with an optimal number of parameters are not generalizable. Third, a lower bound for the sample complexity of general memorization algorithms and the exact sample complexity for memorization algorithms with constant number of parameters are given. It is also shown that there exist data distributions such that, to be generalizable for them, the memorization network must have an exponential number of parameters in the data dimension. Finally, an efficient and generalizable memorization algorithm is given when the number of training samples is greater than the efficient memorization sample complexity of the data distribution.

Related papers

Generalization performance of narrow one-hidden layer networks in the teacher-student setting [40.69556943879117]
We develop a general theory for narrow networks, i.e. networks with a large number of hidden units, yet much smaller than the input dimension.<n>Our theory accurately predicts the generalization error of neural networks trained on regression or classification tasks.
arXiv Detail & Related papers (2025-07-01T10:18:20Z)
Trade-offs in Data Memorization via Strong Data Processing Inequalities [19.969359347811398]
Recent research demonstrated that training large language models involves memorization of a significant fraction of training data.<n>Such memorization can lead to privacy violations when training on sensitive user data and thus motivates the study of data memorization's role in learning.<n>We develop a general approach for proving lower bounds on excess data memorization, that relies on a new connection between strong data processing inequalities and data memorization.
arXiv Detail & Related papers (2025-06-02T16:41:49Z)
Learning of networked spreading models from noisy and incomplete data [7.669018800404791]
We introduce a universal learning method based on scalable dynamic message-passing technique. The algorithm leverages available prior knowledge on the model and on the data, and reconstructs both network structure and parameters of a spreading model. We show that a linear computational complexity of the method with the key model parameters makes the algorithm scalable to large network instances.
arXiv Detail & Related papers (2023-12-20T13:12:47Z)
Memorization with neural nets: going beyond the worst case [5.662924503089369]
In practice, deep neural networks are often able to easily interpolate their training data. For real-world data, however, one intuitively expects the presence of a benign structure so that already occurs at a smaller network size than suggested by memorization capacity. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in time.
arXiv Detail & Related papers (2023-09-30T10:06:05Z)
Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z)
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [4.278591555984394]
We study generalization of neural networks on small algorithmically generated datasets. We show that neural networks learn through a process of "grokking" a pattern in the data. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning.
arXiv Detail & Related papers (2022-01-06T18:43:37Z)
Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. An open question in previous studies of language model memorization is how to filter out "common" memorization. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)
Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation [54.899751055620904]
We study the phenomenon of memorization with turn-over dropout, an efficient method to estimate influence and memorization, for data with true labels (real data) and data with random labels (random data) Our main findings are: (i) For both real data and random data, the optimization of easy examples (e.g., real data) and difficult examples (e.g., random data) are conducted by the network simultaneously, with easy ones at a higher speed; (ii) For real data, a correct difficult example in the training dataset is more informative than an easy one.
arXiv Detail & Related papers (2021-12-16T11:34:23Z)
Robust Generalization of Quadratic Neural Networks via Function Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution. We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z)
Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena [17.205106391379026]
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. We show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension.
arXiv Detail & Related papers (2020-03-23T20:09:31Z)
Encoding-based Memory Modules for Recurrent Neural Networks [79.42778415729475]
We study the memorization subtask from the point of view of the design and training of recurrent neural networks. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences.
arXiv Detail & Related papers (2020-01-31T11:14:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.