Related papers: Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared Atomics

Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared Atomics

URL: http://arxiv.org/abs/2107.04092v1
Date: Thu, 8 Jul 2021 20:13:54 GMT
Title: Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared Atomics
Authors: Dennis Bautembach, Iason Oikonomidis, Antonis Argyros
Abstract summary: We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP) and efficiently facilitates the computation of pre- and post-synaptic spikes. The second optimization targets spike delivery. We partition our graph representation in a way that the number of neurons that need to be updated at any given time.
Score: 0.8360870648463651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP). It combines lazy- with event-driven plasticity and efficiently facilitates the computation of pre- and post-synaptic spikes using bitfields and integer intrinsics. It offers higher bandwidth than event-driven plasticity alone and achieves a 1.5x-2x speedup over our closest competitor. The second optimization targets spike delivery. We partition our graph representation in a way that bounds the number of neurons that need be updated at any given time which allows us to perform said update in shared memory instead of global memory. This is 2x-2.5x faster than our closest competitor. Both optimizations represent the final evolutionary stages of years of iteration on STDP and spike delivery inside "Spice" (/spaIk/), our state of the art SNN simulator. The proposed optimizations are not exclusive to our graph representation or pipeline but are applicable to a multitude of simulator designs. We evaluate our performance on three well-established models and compare ourselves against three other state of the art simulators.

Related papers

Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow [5.004444099596694]
Event-based motion estimation (optical flow) is critical for many real-time applications.<n>Event-based methods including SNNs and GNNs are computationally efficient; however, these approaches fail to capture sufficient asynchronous-temporal information.<n>We introduce Spatio-Temporal State Space Model (STSSM) module along with a novel network architecture to develop an efficient solution with competitive performance.
arXiv Detail & Related papers (2025-06-09T15:51:06Z)
Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme. Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy. This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware [6.749483762719583]
State-tracking capabilities are important for time-series tasks and logical reasoning. Traditional RNNs like LSTMs and GRUs do have these capabilities at the cost of strictly sequential processing. We show how fast these networks can get with our hardware-optimization FlashRNN in Triton and optimized kernels to the register level.
arXiv Detail & Related papers (2024-12-10T18:50:37Z)
Fast Algorithms for Spiking Neural Network Simulation with FPGAs [0.0]
We create spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA) Our best simulators simulate the circuit 25% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. This result is the first for simulating the circuit on a single hardware accelerator.
arXiv Detail & Related papers (2024-05-03T11:39:25Z)
MicroDreamer: Efficient 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction [37.07128043394227]
This paper introduces score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs. We present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks.
arXiv Detail & Related papers (2024-04-30T12:56:14Z)
8-bit Optimizers via Block-wise Quantization [57.25800395197516]
Statefuls maintain statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past values. This state can be used to accelerate optimization compared to plain gradient descent but uses memory that might otherwise be allocated to model parameters. In this paper, we develop first gradients that use 8-bit statistics while maintaining the performance levels of using 32-bit gradient states.
arXiv Detail & Related papers (2021-10-06T15:43:20Z)
Early Convolutions Help Transformers See Better [63.21712652156238]
Vision transformer (ViT) models exhibit substandard optimizability. Modern convolutional neural networks are far easier to optimize. Using a convolutional stem in ViT dramatically increases optimization stability and also improves peak performance.
arXiv Detail & Related papers (2021-06-28T17:59:33Z)
Multi-GPU SNN Simulation with Perfect Static Load Balancing [0.8360870648463651]
We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi- GPU distribution scheme and 3) a static, yet very effective load balancing strategy.
arXiv Detail & Related papers (2021-02-09T07:07:34Z)
Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability. We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z)
FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts. In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2. We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z)
TASO: Time and Space Optimization for Memory-Constrained DNN Inference [5.023660118588569]
Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers.
arXiv Detail & Related papers (2020-05-21T15:08:06Z)
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue. By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D. We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z)
Event-Based Angular Velocity Regression with Spiking Networks [51.145071093099396]
Spiking Neural Networks (SNNs) process information conveyed as temporal spikes rather than numeric values. We propose, for the first time, a temporal regression problem of numerical values given events from an event camera. We show that we can successfully train an SNN to perform angular velocity regression.
arXiv Detail & Related papers (2020-03-05T17:37:16Z)
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA [27.50143717931293]
WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution. We develop the first accelerator platformtextitFastWave for autoregressive convolutional neural networks.
arXiv Detail & Related papers (2020-02-09T06:15:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.