FireFly: A High-Throughput Hardware Accelerator for Spiking Neural
Networks with Efficient DSP and Memory Optimization
- URL: http://arxiv.org/abs/2301.01905v5
- Date: Tue, 6 Jun 2023 06:58:26 GMT
- Title: FireFly: A High-Throughput Hardware Accelerator for Spiking Neural
Networks with Efficient DSP and Memory Optimization
- Authors: Jindong Li and Guobin Shen and Dongcheng Zhao and Qian Zhang and Yi
Zeng
- Abstract summary: Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency.
Most SNN hardware implementations for field-programmable gate arrays (FPGAs) cannot meet arithmetic or memory efficiency requirements.
We propose an FPGA accelerator that can process spikes generated by the firing neuron on-the-fly (FireFly)
- Score: 6.966706170499345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spiking neural networks (SNNs) have been widely used due to their strong
biological interpretability and high energy efficiency. With the introduction
of the backpropagation algorithm and surrogate gradient, the structure of
spiking neural networks has become more complex, and the performance gap with
artificial neural networks has gradually decreased. However, most SNN hardware
implementations for field-programmable gate arrays (FPGAs) cannot meet
arithmetic or memory efficiency requirements, which significantly restricts the
development of SNNs. They do not delve into the arithmetic operations between
the binary spikes and synaptic weights or assume unlimited on-chip RAM
resources by using overly expensive devices on small tasks. To improve
arithmetic efficiency, we analyze the neural dynamics of spiking neurons,
generalize the SNN arithmetic operation to the multiplex-accumulate operation,
and propose a high-performance implementation of such operation by utilizing
the DSP48E2 hard block in Xilinx Ultrascale FPGAs. To improve memory
efficiency, we design a memory system to enable efficient synaptic weights and
membrane voltage memory access with reasonable on-chip RAM consumption.
Combining the above two improvements, we propose an FPGA accelerator that can
process spikes generated by the firing neuron on-the-fly (FireFly). FireFly is
the first SNN accelerator that incorporates DSP optimization techniques into
SNN synaptic operations. FireFly is implemented on several FPGA edge devices
with limited resources but still guarantees a peak performance of 5.53TOP/s at
300MHz. As a lightweight accelerator, FireFly achieves the highest
computational density efficiency compared with existing research using large
FPGA devices.
Related papers
- Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology [2.968768532937366]
Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models.
We develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models.
arXiv Detail & Related papers (2024-10-07T05:04:13Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - FireFly v2: Advancing Hardware Support for High-Performance Spiking
Neural Network with a Spatiotemporal FPGA Accelerator [8.0611988136866]
Spiking Neural Networks (SNNs) are expected to be a promising alternative to Artificial Neural Networks (ANNs)
Specialized SNN hardware offers clear advantages over general-purpose devices in terms of power and performance.
FireFly v2, an FPGA SNN accelerator, can address the issue of non-spike operation in current SOTA SNN algorithms.
arXiv Detail & Related papers (2023-09-28T04:17:02Z) - A Resource-efficient Spiking Neural Network Accelerator Supporting
Emerging Neural Encoding [6.047137174639418]
Spiking neural networks (SNNs) recently gained momentum due to their low-power multiplication-free computing.
SNNs require very long spike trains (up to 1000) to reach an accuracy similar to their artificial neural network (ANN) counterparts for large models.
We present a novel hardware architecture that can efficiently support SNN with emerging neural encoding.
arXiv Detail & Related papers (2022-06-06T10:56:25Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking
Neural Networks [0.0]
Spiking Neural Networks (SNNs) compute in an event-based matter to achieve a more efficient computation than standard Neural Networks.
We propose a novel architecture that is optimized for the processing of Convolutional SNNs that feature a high degree of activation sparsity.
arXiv Detail & Related papers (2022-03-23T14:18:58Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.