Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared
Atomics
- URL: http://arxiv.org/abs/2107.04092v1
- Date: Thu, 8 Jul 2021 20:13:54 GMT
- Title: Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared
Atomics
- Authors: Dennis Bautembach, Iason Oikonomidis, Antonis Argyros
- Abstract summary: We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators.
The first one targets spike timing dependent plasticity (STDP) and efficiently facilitates the computation of pre- and post-synaptic spikes.
The second optimization targets spike delivery. We partition our graph representation in a way that the number of neurons that need to be updated at any given time.
- Score: 0.8360870648463651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present two novel optimizations that accelerate clock-based spiking neural
network (SNN) simulators. The first one targets spike timing dependent
plasticity (STDP). It combines lazy- with event-driven plasticity and
efficiently facilitates the computation of pre- and post-synaptic spikes using
bitfields and integer intrinsics. It offers higher bandwidth than event-driven
plasticity alone and achieves a 1.5x-2x speedup over our closest competitor.
The second optimization targets spike delivery. We partition our graph
representation in a way that bounds the number of neurons that need be updated
at any given time which allows us to perform said update in shared memory
instead of global memory. This is 2x-2.5x faster than our closest competitor.
Both optimizations represent the final evolutionary stages of years of
iteration on STDP and spike delivery inside "Spice" (/spaIk/), our state of the
art SNN simulator. The proposed optimizations are not exclusive to our graph
representation or pipeline but are applicable to a multitude of simulator
designs. We evaluate our performance on three well-established models and
compare ourselves against three other state of the art simulators.
Related papers
- Fast Algorithms for Spiking Neural Network Simulation with FPGAs [0.0]
We create spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA)
Our best simulators simulate the circuit 25% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory.
This result is the first for simulating the circuit on a single hardware accelerator.
arXiv Detail & Related papers (2024-05-03T11:39:25Z) - MicroDreamer: Efficient 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction [37.07128043394227]
This paper introduces score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs.
We present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks.
arXiv Detail & Related papers (2024-04-30T12:56:14Z) - 8-bit Optimizers via Block-wise Quantization [57.25800395197516]
Statefuls maintain statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past values.
This state can be used to accelerate optimization compared to plain gradient descent but uses memory that might otherwise be allocated to model parameters.
In this paper, we develop first gradients that use 8-bit statistics while maintaining the performance levels of using 32-bit gradient states.
arXiv Detail & Related papers (2021-10-06T15:43:20Z) - Early Convolutions Help Transformers See Better [63.21712652156238]
Vision transformer (ViT) models exhibit substandard optimizability.
Modern convolutional neural networks are far easier to optimize.
Using a convolutional stem in ViT dramatically increases optimization stability and also improves peak performance.
arXiv Detail & Related papers (2021-06-28T17:59:33Z) - Multi-GPU SNN Simulation with Perfect Static Load Balancing [0.8360870648463651]
We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs.
This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi- GPU distribution scheme and 3) a static, yet very effective load balancing strategy.
arXiv Detail & Related papers (2021-02-09T07:07:34Z) - Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability.
We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - TASO: Time and Space Optimization for Memory-Constrained DNN Inference [5.023660118588569]
Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices.
We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers.
arXiv Detail & Related papers (2020-05-21T15:08:06Z) - Steepest Descent Neural Architecture Optimization: Escaping Local
Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue.
By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D.
We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z) - Event-Based Angular Velocity Regression with Spiking Networks [51.145071093099396]
Spiking Neural Networks (SNNs) process information conveyed as temporal spikes rather than numeric values.
We propose, for the first time, a temporal regression problem of numerical values given events from an event camera.
We show that we can successfully train an SNN to perform angular velocity regression.
arXiv Detail & Related papers (2020-03-05T17:37:16Z) - FastWave: Accelerating Autoregressive Convolutional Neural Networks on
FPGA [27.50143717931293]
WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution.
We develop the first accelerator platformtextitFastWave for autoregressive convolutional neural networks.
arXiv Detail & Related papers (2020-02-09T06:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.