Stabilizing Spiking Neuron Training
- URL: http://arxiv.org/abs/2202.00282v4
- Date: Fri, 5 Jan 2024 00:28:16 GMT
- Title: Stabilizing Spiking Neuron Training
- Authors: Luca Herranz-Celotti and Jean Rouat
- Abstract summary: spiking Neuromorphic Computing uses binary activity to improve Artificial Intelligence energy efficiency.
It remains unclear how to determine the best SG for a given task and network.
We show how it can be used to reduce the need for extensive grid-search of dampening, sharpness and tail-fatness of the SG.
- Score: 3.335932527835653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stability arguments are often used to prevent learning algorithms from having
ever increasing activity and weights that hinder generalization. However,
stability conditions can clash with the sparsity required to augment the energy
efficiency of spiking neurons. Nonetheless it can also provide solutions. In
fact, spiking Neuromorphic Computing uses binary activity to improve Artificial
Intelligence energy efficiency. However, its non-smoothness requires
approximate gradients, known as Surrogate Gradients (SG), to close the
performance gap with Deep Learning. Several SG have been proposed in the
literature, but it remains unclear how to determine the best SG for a given
task and network. Thus, we aim at theoretically define the best SG, through
stability arguments, to reduce the need for grid search. In fact, we show that
more complex tasks and networks need more careful choice of SG, even if overall
the derivative of the fast sigmoid tends to outperform the other, for a wide
range of learning rates. We therefore design a stability based theoretical
method to choose initialization and SG shape before training on the most common
spiking neuron, the Leaky Integrate and Fire (LIF). Since our stability method
suggests the use of high firing rates at initialization, which is non-standard
in the neuromorphic literature, we show that high initial firing rates,
combined with a sparsity encouraging loss term introduced gradually, can lead
to better generalization, depending on the SG shape. Our stability based
theoretical solution, finds a SG and initialization that experimentally result
in improved accuracy. We show how it can be used to reduce the need of
extensive grid-search of dampening, sharpness and tail-fatness of the SG. We
also show that our stability concepts can be extended to be applicable on
different LIF variants, such as DECOLLE and fluctuations-driven
initializations.
Related papers
- The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization [4.7256945641654164]
gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training.
Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization under suitable high-dimensional settings.
This paper investigates SGD with two essential components in practice: exponentially decaying step size schedule and momentum.
arXiv Detail & Related papers (2024-09-15T14:20:03Z) - A Precise Characterization of SGD Stability Using Loss Surface Geometry [8.942671556572073]
Descent Gradient (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding.
Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates.
arXiv Detail & Related papers (2024-01-22T19:46:30Z) - Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Membrane Potential Distribution Adjustment and Parametric Surrogate
Gradient in Spiking Neural Networks [3.485537704990941]
Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch.
We propose the parametric surrogate gradient (PSG) method to iteratively update SG and eventually determine an optimal surrogate gradient parameter.
Experimental results demonstrate that the proposed methods can be readily integrated with backpropagation through time (BPTT) algorithm.
arXiv Detail & Related papers (2023-04-26T05:02:41Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - A High Probability Analysis of Adaptive SGD with Momentum [22.9530287983179]
Gradient Descent (DSG) and its variants are the most used algorithms in machine learning applications.
We show for the first time the probability of the gradients to zero in smooth non setting for DelayedGrad with momentum.
arXiv Detail & Related papers (2020-07-28T15:06:22Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.