Related papers: Fast Algorithms for Spiking Neural Network Simulation with FPGAs

Fast Algorithms for Spiking Neural Network Simulation with FPGAs

URL: http://arxiv.org/abs/2405.02019v1
Date: Fri, 3 May 2024 11:39:25 GMT
Title: Fast Algorithms for Spiking Neural Network Simulation with FPGAs
Authors: Björn A. Lindqvist, Artur Podobas,
Abstract summary: We create spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA) Our best simulators simulate the circuit 25% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. This result is the first for simulating the circuit on a single hardware accelerator.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using OpenCL-based high-level synthesis, we create a number of spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA). Our best simulators simulate the circuit 25\% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. Speed-wise they compare favorably to the state-of-the-art GPU-based simulators and their energy usage is lower than any other published result. This result is the first for simulating the circuit on a single hardware accelerator. We also extensively analyze the techniques and algorithms we implement our simulators with, many of which can be realized on other types of hardware. Thus, this article is of interest to any researcher or practitioner interested in efficient SNN simulation, whether they target FPGAs or not.

Related papers

TQml Simulator: Optimized Simulation of Quantum Machine Learning [0.0]
We benchmark universal and gate-specific techniques for simulating the action of layers of gates on quantum state vectors.<n>We develop a numerical simulator, named TQml Simulator, that employs the most efficient simulation method for each layer in a given circuit.
arXiv Detail & Related papers (2025-06-05T11:19:05Z)
HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices. The inference speed of the design is evaluated over different resource constrained FPGA devices. We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z)
Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures. We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z)
Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs [11.87665112550076]
In quantum hardware, primary simulation methods are based on state vectors and tensor networks. As the number of qubits and quantum gates grows larger, traditional state-vector based quantum circuit simulation methods prove inadequate due to the overwhelming size of the Hilbert space and extensive entanglement. In this study, we propose general optimization strategies from two aspects: computational efficiency and accuracy.
arXiv Detail & Related papers (2023-10-06T02:24:05Z)
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices [44.440915387556544]
AQFP devices serve as excellent carriers for binary neural network (BNN) computations. We propose SupeRBNN, an AQFP-based randomized BNN acceleration framework. We show that our design achieves an energy efficiency of approximately 7.8x104 times higher than that of the ReRAM-based BNN framework.
arXiv Detail & Related papers (2023-09-21T16:14:42Z)
Tricking AI chips into Simulating the Human Brain: A Detailed Performance Analysis [0.5354801701968198]
We evaluate multiple, cutting-edge AI-chips (Graphcore IPU, GroqChip, Nvidia GPU with inferior Cores and Google TPU) for brain simulation. Our performance analysis reveals that the simulation problem maps extremely well onto the GPU and TPU architectures. The GroqChip outperforms both platforms for small networks but, due to implementing some floating-point operations at reduced accuracy, is found not yet usable for brain simulation.
arXiv Detail & Related papers (2023-01-31T13:51:37Z)
Towards real-time and energy efficient Siamese tracking -- a hardware-software approach [0.0]
We propose a hardware-software implementation of the well-known fully connected Siamese tracker (SiamFC) We have developed a quantised Siamese network for the FINN accelerator, using algorithm-accelerator co-design, and performed design space exploration. For our network, running in the programmable logic part of the Zynq UltraScale+ MPSoC ZCU104, we achieved the processing of almost 50 frames-per-second with tracker accuracy on par with its floating point counterpart.
arXiv Detail & Related papers (2022-05-21T18:31:07Z)
Parallel Simulation of Quantum Networks with Distributed Quantum State Management [56.24769206561207]
We identify requirements for parallel simulation of quantum networks and develop the first parallel discrete event quantum network simulator. Our contributions include the design and development of a quantum state manager that maintains shared quantum information distributed across multiple processes. We release the parallel SeQUeNCe simulator as an open-source tool alongside the existing sequential version.
arXiv Detail & Related papers (2021-11-06T16:51:17Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems. We numerically demonstrate that the MNIST image dataset satisfies such conditions. We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks [0.2148535041822524]
Convolutional neural networks (CNNs) are typically trained using 16- or 32-bit floating-point (FP) Low-precision floating-point (FP) can be highly effective for inference. Existing processors do not generally support custom precision FP. We propose hardware optimized bitslice-parallel floating-point operators (HOBFLOPS)
arXiv Detail & Related papers (2020-07-11T00:37:35Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural Network Approach [5.365198933008246]
CSM-NN is a scalable simulation framework with optimized neural network structures and processing algorithms. Experiments show that CSM-NN reduces the simulation time by up to $6times$ compared to a state-of-the-art current source model based simulator running on a CPU. CSM-NN also provides high accuracy levels, with less than $2%$ error, compared to HSPICE.
arXiv Detail & Related papers (2020-02-13T00:29:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.