Related papers: Why Deep Learning Generalizes

Why Deep Learning Generalizes

URL: http://arxiv.org/abs/2211.09639v1
Date: Thu, 17 Nov 2022 16:39:43 GMT
Title: Why Deep Learning Generalizes
Authors: Benjamin L. Badger
Abstract summary: We find that memorization is difficult relative to generalization, but that adding noise memorization makes memorization easier. We show that generalization results from a model's parameters being attracted to points of maximal stability with respect to that model's inputs during gradient descent.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Very large deep learning models trained using gradient descent are remarkably resistant to memorization given their huge capacity, but are at the same time capable of fitting large datasets of pure noise. Here methods are introduced by which models may be trained to memorize datasets that normally are generalized. We find that memorization is difficult relative to generalization, but that adding noise makes memorization easier. Increasing the dataset size exaggerates the characteristics of that dataset: model access to more training samples makes overfitting easier for random data, but somewhat harder for natural images. The bias of deep learning towards generalization is explored theoretically, and we show that generalization results from a model's parameters being attracted to points of maximal stability with respect to that model's inputs during gradient descent.

Related papers

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks [84.30018805150607]
We present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data. We demonstrate this kind of easy-to-hard generalization using simple finetuning methods like in-context learning, linear heads, and QLoRA. We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied.
arXiv Detail & Related papers (2024-01-12T18:36:29Z)
Non-Vacuous Generalization Bounds for Large Language Models [78.42762571499061]
We provide the first non-vacuous generalization bounds for pretrained large language models. We show that larger models have better generalization bounds and are more compressible than smaller models.
arXiv Detail & Related papers (2023-12-28T17:58:42Z)
Data Factors for Better Compositional Generalization [60.698130703909804]
We conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors. We show that increased dataset complexity can lead to better generalization behavior on multiple different generalization challenges. We explore how training examples of different difficulty levels influence generalization differently.
arXiv Detail & Related papers (2023-11-08T01:27:34Z)
What do larger image classifiers memorise? [64.01325988398838]
We show that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes. We find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation.
arXiv Detail & Related papers (2023-10-09T01:52:07Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
The Curious Case of Benign Memorization [19.74244993871716]
We show that under training protocols that include data augmentation, neural networks learn to memorize entirely random labels in a benign way. We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers.
arXiv Detail & Related papers (2022-10-25T13:41:31Z)
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models [64.22311189896888]
We study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process.
arXiv Detail & Related papers (2022-05-22T07:43:50Z)
Contrasting random and learned features in deep Bayesian linear regression [12.234742322758418]
We study how the ability to learn affects the generalization performance of a simple class of models. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch.
arXiv Detail & Related papers (2022-03-01T15:51:29Z)
Quantifying Memorization Across Neural Language Models [61.58529162310382]
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others). We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data.
arXiv Detail & Related papers (2022-02-15T18:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.