Learning threshold neurons via the "edge of stability"
- URL: http://arxiv.org/abs/2212.07469v2
- Date: Thu, 19 Oct 2023 12:00:54 GMT
- Title: Learning threshold neurons via the "edge of stability"
- Authors: Kwangjun Ahn, S\'ebastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe
Suarez, Yi Zhang
- Abstract summary: Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate.
"Edge of stability" or "unstable dynamics" works on two-layer neural networks.
This paper performs a detailed analysis of gradient descent for simplified models of two-layer neural networks.
- Score: 33.64379851307296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing analyses of neural network training often operate under the
unrealistic assumption of an extremely small learning rate. This lies in stark
contrast to practical wisdom and empirical studies, such as the work of J.
Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of
stability" or "unstable convergence") and potential benefits for generalization
in the large learning rate regime. Despite a flurry of recent works on this
topic, however, the latter effect is still poorly understood. In this paper, we
take a step towards understanding genuinely non-convex training dynamics with
large learning rates by performing a detailed analysis of gradient descent for
simplified models of two-layer neural networks. For these models, we provably
establish the edge of stability phenomenon and discover a sharp phase
transition for the step size below which the neural network fails to learn
"threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This
elucidates one possible mechanism by which the edge of stability can in fact
lead to better generalization, as threshold neurons are basic building blocks
with useful inductive bias for many tasks.
Related papers
- Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning.
Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task.
They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima.
This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z) - Decorrelating neurons using persistence [29.25969187808722]
We present two regularisation terms computed from the weights of a minimum spanning tree of a clique.
We demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms.
We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms.
arXiv Detail & Related papers (2023-08-09T11:09:14Z) - What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? [0.0]
We study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods.
We show how NTKs allow to generate adversarial examples in a training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the lazy'' regime.
arXiv Detail & Related papers (2022-10-11T16:11:48Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Object-based attention for spatio-temporal reasoning: Outperforming
neuro-symbolic models with flexible distributed architectures [15.946511512356878]
We show that a fully-learned neural network with the right inductive biases can perform substantially better than all previous neural-symbolic models.
Our model makes critical use of both self-attention and learned "soft" object-centric representations.
arXiv Detail & Related papers (2020-12-15T18:57:40Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Geometry Perspective Of Estimating Learning Capability Of Neural
Networks [0.0]
The paper considers a broad class of neural networks with generalized architecture performing simple least square regression with gradient descent (SGD)
The relationship between the generalization capability with the stability of the neural network has also been discussed.
By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.
arXiv Detail & Related papers (2020-11-03T12:03:19Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.