Hessian Aware Quantization of Spiking Neural Networks
- URL: http://arxiv.org/abs/2104.14117v2
- Date: Mon, 23 Aug 2021 18:08:01 GMT
- Title: Hessian Aware Quantization of Spiking Neural Networks
- Authors: Hin Wai Lui and Emre Neftci
- Abstract summary: Neuromorphic architecture allows massively parallel computation with variable and local bit-precisions.
Current gradient based methods of SNN training use a complex neuron model with multiple state variables.
We present a simplified neuron model that reduces the number of state variables by 4-fold while still being compatible with gradient based training.
- Score: 1.90365714903665
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To achieve the low latency, high throughput, and energy efficiency benefits
of Spiking Neural Networks (SNNs), reducing the memory and compute requirements
when running on a neuromorphic hardware is an important step. Neuromorphic
architecture allows massively parallel computation with variable and local
bit-precisions. However, how different bit-precisions should be allocated to
different layers or connections of the network is not trivial. In this work, we
demonstrate how a layer-wise Hessian trace analysis can measure the sensitivity
of the loss to any perturbation of the layer's weights, and this can be used to
guide the allocation of a layer-specific bit-precision when quantizing an SNN.
In addition, current gradient based methods of SNN training use a complex
neuron model with multiple state variables, which is not ideal for compute and
memory efficiency. To address this challenge, we present a simplified neuron
model that reduces the number of state variables by 4-fold while still being
compatible with gradient based training. We find that the impact on model
accuracy when using a layer-wise bit-precision correlated well with that
layer's Hessian trace. The accuracy of the optimal quantized network only
dropped by 0.2%, yet the network size was reduced by 58%. This reduces memory
usage and allows fixed-point arithmetic with simpler digital circuits to be
used, increasing the overall throughput and energy efficiency.
Related papers
- Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Low Precision Quantization-aware Training in Spiking Neural Networks
with Differentiable Quantization Function [0.5046831208137847]
This work aims to bridge the gap between recent progress in quantized neural networks and spiking neural networks.
It presents an extensive study on the performance of the quantization function, represented as a linear combination of sigmoid functions.
The presented quantization function demonstrates the state-of-the-art performance on four popular benchmarks.
arXiv Detail & Related papers (2023-05-30T09:42:05Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Ultra-low Latency Adaptive Local Binary Spiking Neural Network with
Accuracy Loss Estimator [4.554628904670269]
We propose an ultra-low latency adaptive local binary spiking neural network (ALBSNN) with accuracy loss estimators.
Experimental results show that this method can reduce storage space by more than 20 % without losing network accuracy.
arXiv Detail & Related papers (2022-07-31T09:03:57Z) - Converting Artificial Neural Networks to Spiking Neural Networks via
Parameter Calibration [21.117214351356765]
Spiking Neural Network (SNN) is recognized as one of the next-generation neural networks.
In this work, we argue that simply copying and pasting the weights of ANN to SNN inevitably results in activation mismatch.
We propose a set of layer-wise parameter calibration algorithms, which adjusts the parameters to minimize the activation mismatch.
arXiv Detail & Related papers (2022-05-06T18:22:09Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - $S^3$: Sign-Sparse-Shift Reparametrization for Effective Training of
Low-bit Shift Networks [41.54155265996312]
Shift neural networks reduce complexity by removing expensive multiplication operations and quantizing continuous weights into low-bit discrete values.
Our proposed training method pushes the boundaries of shift neural networks and shows 3-bit shift networks out-performs their full-precision counterparts in terms of top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2021-07-07T19:33:02Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.