Slope and generalization properties of neural networks
- URL: http://arxiv.org/abs/2107.01473v1
- Date: Sat, 3 Jul 2021 17:54:27 GMT
- Title: Slope and generalization properties of neural networks
- Authors: Anton Johansson, Niklas Engsner, Claes Stranneg{\aa}rd, Petter Mostad
- Abstract summary: We show that the distribution of the slope of a well-trained neural network classifier is generally independent of the width of the layers in a fully connected network.
The slope is of similar size throughout the relevant volume, and varies smoothly. It also behaves as predicted in rescaling examples.
We discuss possible applications of the slope concept, such as using it as a part of the loss function or stopping criterion during network training, or ranking data sets in terms of their complexity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are very successful tools in for example advanced
classification. From a statistical point of view, fitting a neural network may
be seen as a kind of regression, where we seek a function from the input space
to a space of classification probabilities that follows the "general" shape of
the data, but avoids overfitting by avoiding memorization of individual data
points. In statistics, this can be done by controlling the geometric complexity
of the regression function. We propose to do something similar when fitting
neural networks by controlling the slope of the network.
After defining the slope and discussing some of its theoretical properties,
we go on to show empirically in examples, using ReLU networks, that the
distribution of the slope of a well-trained neural network classifier is
generally independent of the width of the layers in a fully connected network,
and that the mean of the distribution only has a weak dependence on the model
architecture in general. The slope is of similar size throughout the relevant
volume, and varies smoothly. It also behaves as predicted in rescaling
examples. We discuss possible applications of the slope concept, such as using
it as a part of the loss function or stopping criterion during network
training, or ranking data sets in terms of their complexity.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural
Isometry and Exact Recovery [33.74925020397343]
Deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters.
We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization.
We show that ReLU networks learn simple and sparse models even when the labels are noisy.
arXiv Detail & Related papers (2022-09-30T06:47:15Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Recurrent neural networks that generalize from examples and optimize by
dreaming [0.0]
We introduce a generalized Hopfield network where pairwise couplings between neurons are built according to Hebb's prescription for on-line learning.
We let the network experience solely a dataset made of a sample of noisy examples for each pattern.
Remarkably, the sleeping mechanisms always significantly reduce the dataset size required to correctly generalize.
arXiv Detail & Related papers (2022-04-17T08:40:54Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Dive into Layers: Neural Network Capacity Bounding using Algebraic
Geometry [55.57953219617467]
We show that the learnability of a neural network is directly related to its size.
We use Betti numbers to measure the topological geometric complexity of input data and the neural network.
We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion.
arXiv Detail & Related papers (2021-09-03T11:45:51Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Neural Networks and Polynomial Regression. Demystifying the
Overparametrization Phenomena [17.205106391379026]
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data.
A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data.
We show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension.
arXiv Detail & Related papers (2020-03-23T20:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.