Related papers: Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

URL: http://arxiv.org/abs/2401.04486v2
Date: Mon, 30 Sep 2024 00:00:16 GMT
Title: Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks
Authors: Yufei Guo, Yuanpei Chen, Zecheng Hao, Weihang Peng, Zhou Jie, Yuhan Zhang, Xiaode Liu, Zhe Ma,
Abstract summary: Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention. Training an SNN directly poses a challenge due to the undefined gradient of the firing spike process. We propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers.
Score: 15.691263438655842
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention. It utilizes binary spike activations to transmit information, thereby replacing multiplications with additions and resulting in high energy efficiency. However, training an SNN directly poses a challenge due to the undefined gradient of the firing spike process. Although prior works have employed various surrogate gradient training methods that use an alternative function to replace the firing process during back-propagation, these approaches ignore an intrinsic problem: gradient vanishing. To address this issue, we propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers. This enables us to present the gradient to the shallow layers directly, thereby significantly mitigating the gradient vanishing problem. Additionally, this method does not introduce any burden during the inference phase. To strike a balance between final accuracy and ease of training, we also propose an evolutionary training framework and implement it by inducing a balance coefficient that dynamically changes with the training epoch, which further improves the network's performance. Extensive experiments conducted over static and dynamic datasets using several popular network structures reveal that our method consistently outperforms state-of-the-art methods.

Related papers

Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition [0.5024983453990065]
layer-wise training eliminates the need for cross-entropy loss and backpropagation.<n>Most existing layer-wise training approaches have been evaluated only on relatively small datasets.<n>We propose a novel layer-wise training approach based on the recently developed deterministic information bottleneck (DIB) and the matrix-based R'enyi's $alpha$-order entropy functional.
arXiv Detail & Related papers (2025-10-31T17:24:58Z)
Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks [6.185603604308997]
Neuromorphic computing systems are set to revolutionize energy-constrained robotics by achieving orders-of-magnitude efficiency gains.<n> Spiking Neural Networks (SNNs) represent a promising algorithmic approach for these systems, yet their application to complex control tasks faces two critical challenges.<n>We propose a novel training approach that leverages a privileged guiding policy to bootstrap the learning process, while still exploiting online environment interactions with the spiking policy.
arXiv Detail & Related papers (2025-10-28T14:28:40Z)
Randomized Forward Mode Gradient for Spiking Neural Networks in Scientific Machine Learning [4.178826560825283]
Spiking neural networks (SNNs) represent a promising approach in machine learning, combining the hierarchical learning capabilities of deep neural networks with the energy efficiency of spike-based computations. Traditional end-to-end training of SNNs is often based on back-propagation, where weight updates are derived from gradients computed through the chain rule. This method encounters challenges due to its limited biological plausibility and inefficiencies on neuromorphic hardware. In this study, we introduce an alternative training approach for SNNs. Instead of using back-propagation, we leverage weight perturbation methods within a forward-mode
arXiv Detail & Related papers (2024-11-11T15:20:54Z)
Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time. BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z)
A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique [0.0]
We propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer. Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets.
arXiv Detail & Related papers (2023-08-09T16:41:00Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware. We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z)
Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient. Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z)
Scaling Private Deep Learning with Low-Rank and Sparse Gradients [5.14780936727027]
We propose a framework that exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates. A novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates. Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.
arXiv Detail & Related papers (2022-07-06T14:09:47Z)
Backward Gradient Normalization in Deep Neural Networks [68.8204255655161]
We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm.
arXiv Detail & Related papers (2021-06-17T13:24:43Z)
Gradient Amplification: An efficient way to train deep neural networks [1.6542034477245091]
We propose gradient amplification approach for training deep learning models to prevent vanishing gradients. We also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates.
arXiv Detail & Related papers (2020-06-16T20:30:55Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.