Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural
Networks
- URL: http://arxiv.org/abs/2104.12528v2
- Date: Thu, 29 Apr 2021 00:15:55 GMT
- Title: Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural
Networks
- Authors: Sayeed Shafayet Chowdhury, Isha Garg and Kaushik Roy
- Abstract summary: Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods.
However, a major drawback of SNNs is high inference latency.
In this paper, we propose spatial and temporal pruning of SNNs.
- Score: 6.011954485684313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spiking Neural Networks (SNNs) are a promising alternative to traditional
deep learning methods since they perform event-driven information processing.
However, a major drawback of SNNs is high inference latency. The efficiency of
SNNs could be enhanced using compression methods such as pruning and
quantization. Notably, SNNs, unlike their non-spiking counterparts, consist of
a temporal dimension, the compression of which can lead to latency reduction.
In this paper, we propose spatial and temporal pruning of SNNs. First,
structured spatial pruning is performed by determining the layer-wise
significant dimensions using principal component analysis of the average
accumulated membrane potential of the neurons. This step leads to 10-14X model
compression. Additionally, it enables inference with lower latency and
decreases the spike count per inference. To further reduce latency, temporal
pruning is performed by gradually reducing the timesteps while training. The
networks are trained using surrogate gradient descent based backpropagation and
we validate the results on CIFAR10 and CIFAR100, using VGG architectures. The
spatiotemporally pruned SNNs achieve 89.04% and 66.4% accuracy on CIFAR10 and
CIFAR100, respectively, while performing inference with 3-30X reduced latency
compared to state-of-the-art SNNs. Moreover, they require 8-14X lesser compute
energy compared to their unpruned standard deep learning counterparts. The
energy numbers are obtained by multiplying the number of operations with energy
per operation. These SNNs also provide 1-4% higher robustness against Gaussian
noise corrupted inputs. Furthermore, we perform weight quantization and find
that performance remains reasonably stable up to 5-bit quantization.
Related papers
- Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks [50.32980443749865]
Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biologicalability.
Current SNNs struggle to balance accuracy and latency in neuromorphic datasets.
We propose Step-wise Distillation (HSD) method, tailored for neuromorphic datasets.
arXiv Detail & Related papers (2024-09-19T06:52:34Z) - LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient.
We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z) - Shrinking Your TimeStep: Towards Low-Latency Neuromorphic Object
Recognition with Spiking Neural Network [5.174808367448261]
Neuromorphic object recognition with spiking neural networks (SNNs) is the cornerstone of low-power neuromorphic computing.
Existing SNNs suffer from significant latency, utilizing 10 to 40 timesteps or more, to recognize neuromorphic objects.
In this work, we propose the Shrinking SNN (SSNN) to achieve low-latency neuromorphic object recognition without reducing performance.
arXiv Detail & Related papers (2024-01-02T02:05:05Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking
Neural Networks? [3.2108350580418166]
Spiking neural networks (SNNs) operate via binary spikes distributed over time.
SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN)
We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN.
arXiv Detail & Related papers (2021-12-22T18:47:45Z) - Direct Training via Backpropagation for Ultra-low Latency Spiking Neural
Networks with Multi-threshold [3.286515597773624]
Spiking neural networks (SNNs) can utilizetemporal information and have a nature of energy efficiency.
We propose a novel training method based on backpropagation (BP) for ultra-low latency(1-2 timethreshold) SNN with multi-threshold model.
Our proposed method achieves an average accuracy of 99.56%, 93.08%, and 87.90% on MNIST, FashionMNIST, and CIFAR10, respectively with only 2 time steps.
arXiv Detail & Related papers (2021-11-25T07:04:28Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - One Timestep is All You Need: Training Spiking Neural Networks with
Ultra Low Latency [8.590196535871343]
Spiking Neural Networks (SNNs) are energy efficient alternatives to commonly used deep neural networks (DNNs)
High inference latency is a significant hindrance to the edge deployment of deep SNNs.
We propose an Iterative Initialization and Retraining method for SNNs (IIR-SNN) to perform single shot inference in the temporal axis.
arXiv Detail & Related papers (2021-10-01T22:54:59Z) - Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided
Compression [12.37129078618206]
Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks.
Most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency.
This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activity.
arXiv Detail & Related papers (2021-07-16T18:23:36Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference
to ANN-Level Accuracy [51.861168222799186]
Spiking Neural Networks (SNNs) are a type of neuromorphic, or brain-inspired network.
SNNs are sparse, accessing very few weights, and typically only use addition operations instead of the more power-intensive multiply-and-accumulate operations.
In this work, we aim to overcome the limitations of TTFS-encoded neuromorphic systems.
arXiv Detail & Related papers (2020-06-03T15:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.