On the geometry of generalization and memorization in deep neural
networks
- URL: http://arxiv.org/abs/2105.14602v1
- Date: Sun, 30 May 2021 19:07:33 GMT
- Title: On the geometry of generalization and memorization in deep neural
networks
- Authors: Cory Stephenson, Suchismita Padhy, Abhinav Ganesh, Yue Hui, Hanlin
Tang and SueYeon Chung
- Abstract summary: We study the structure of when and where memorization occurs in a deep network.
All layers preferentially learn from examples which share features, and link this behavior to generalization performance.
We find that memorization predominately occurs in the deeper layers, due to decreasing object' radius and dimension.
- Score: 15.250162344382051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how large neural networks avoid memorizing training data is key
to explaining their high generalization performance. To examine the structure
of when and where memorization occurs in a deep network, we use a recently
developed replica-based mean field theoretic geometric analysis method. We find
that all layers preferentially learn from examples which share features, and
link this behavior to generalization performance. Memorization predominately
occurs in the deeper layers, due to decreasing object manifolds' radius and
dimension, whereas early layers are minimally affected. This predicts that
generalization can be restored by reverting the final few layer weights to
earlier epochs before significant memorization occurred, which is confirmed by
the experiments. Additionally, by studying generalization under different model
sizes, we reveal the connection between the double descent phenomenon and the
underlying model geometry. Finally, analytical analysis shows that networks
avoid memorization early in training because close to initialization, the
gradient contribution from permuted examples are small. These findings provide
quantitative evidence for the structure of memorization across layers of a deep
neural network, the drivers for such structure, and its connection to manifold
geometric properties.
Related papers
- Generalization Below the Edge of Stability: The Role of Data Geometry [60.147710896851045]
We show how data geometry controls generalization in ReLU networks trained below the edge of stability.<n>For data distributions supported on a mixture of low-dimensional balls, we derive generalization bounds that provably adapt to the intrinsic dimension.<n>Our results consolidate disparate empirical findings that have appeared in the literature.
arXiv Detail & Related papers (2025-10-20T21:40:36Z) - Approximating Latent Manifolds in Neural Networks via Vanishing Ideals [20.464009622419766]
We establish a connection between manifold learning and computational algebra by demonstrating how vanishing ideals can characterize the latent manifold of deep networks.
We propose a new neural architecture that truncates a pretrained network at an intermediate layer, and approximates each class manifold via generators of the vanishing ideal.
The resulting models have significantly fewer layers than their pretrained baselines, while maintaining comparable accuracy, achieving higher throughput and utilizing fewer parameters.
arXiv Detail & Related papers (2025-02-20T21:23:02Z) - Storing overlapping associative memories on latent manifolds in low-rank spiking networks [5.041384008847852]
We revisit the associative memory problem in light of advances in understanding spike-based computation.
We show that the spiking activity for a large class of all-inhibitory networks is situated on a low-dimensional, convex, and piecewise-linear manifold.
We propose several learning rules, and demonstrate a linear scaling of the storage capacity with the number of neurons, as well as robust pattern completion abilities.
arXiv Detail & Related papers (2024-11-26T14:48:25Z) - A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes [49.32130498861987]
We study the case of non-differentiable activation functions, such as ReLU.
Two recent works introduced a geometric framework to study neural networks.
We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.
arXiv Detail & Related papers (2024-04-09T08:11:46Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Riemannian Residual Neural Networks [58.925132597945634]
We show how to extend the residual neural network (ResNet)
ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks.
arXiv Detail & Related papers (2023-10-16T02:12:32Z) - The learning phases in NN: From Fitting the Majority to Fitting a Few [2.5991265608180396]
We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training.
We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
arXiv Detail & Related papers (2022-02-16T19:11:42Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.