Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training
- URL: http://arxiv.org/abs/2312.09391v1
- Date: Thu, 14 Dec 2023 23:07:37 GMT
- Title: Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training
- Authors: Xi Chen, Chang Gao, Zuowen Wang, Longbiao Cheng, Sheng Zhou, Shih-Chii
Liu, Tobi Delbruck
- Abstract summary: This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge.
Results show a reduction of $sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss.
We show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.
- Score: 20.49255973077044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks.
However, training RNNs involves dense matrix multiplications which require
hardware that can support a large number of arithmetic operations and memory
accesses. Implementing online training of RNNs on the edge calls for optimized
algorithms for an efficient deployment on hardware. Inspired by the spiking
neuron model, the Delta RNN exploits temporal sparsity during inference by
skipping over the update of hidden states from those inactivated neurons whose
change of activation across two timesteps is below a defined threshold. This
work describes a training algorithm for Delta RNNs that exploits temporal
sparsity in the backward propagation phase to reduce computational requirements
for training on the edge. Due to the symmetric computation graphs of forward
and backward propagation during training, the gradient computation of
inactivated neurons can be skipped. Results show a reduction of $\sim$80% in
matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech
Commands dataset with negligible accuracy loss. Logic simulations of a hardware
accelerator designed for the training algorithm show 2-10X speedup in matrix
computations for an activation sparsity range of 50%-90%. Additionally, we show
that the proposed Delta RNN training will be useful for online incremental
learning on edge devices with limited computing resources.
Related papers
- Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and
Dataflow Co-Design [15.47240906902083]
This paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design.
At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights.
At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to support both the regular dense operations and the computation-efficient N:M sparse operations.
arXiv Detail & Related papers (2023-09-22T17:26:19Z) - Accelerating SNN Training with Stochastic Parallelizable Spiking Neurons [1.7056768055368383]
Spiking neural networks (SNN) are able to learn features while using less energy, especially on neuromorphic hardware.
Most widely used neuron in deep learning is the temporal and Fire (LIF) neuron.
arXiv Detail & Related papers (2023-06-22T04:25:27Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - RSC: Accelerating Graph Neural Networks Training via Randomized Sparse
Computations [56.59168541623729]
Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware.
We explore trading off the computational precision to reduce the time complexity via sampling-based approximation.
We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
arXiv Detail & Related papers (2022-10-19T17:25:33Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - DNN Training Acceleration via Exploring GPGPU Friendly Sparsity [16.406482603838157]
We propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns.
We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss.
We also propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration.
arXiv Detail & Related papers (2022-03-11T01:32:03Z) - Efficient Training of Spiking Neural Networks with Temporally-Truncated
Local Backpropagation through Time [1.926678651590519]
Training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions.
This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm.
arXiv Detail & Related papers (2021-12-13T07:44:58Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference.
We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.