Efficient Synaptic Delay Implementation in Digital Event-Driven AI Accelerators
- URL: http://arxiv.org/abs/2501.13610v1
- Date: Thu, 23 Jan 2025 12:30:04 GMT
- Title: Efficient Synaptic Delay Implementation in Digital Event-Driven AI Accelerators
- Authors: Roy Meijer, Paul Detterer, Amirreza Yousefzadeh, Alberto Patino-Saucedo, Guanghzi Tang, Kanishkan Vadivel, Yinfu Xu, Manil-Dev Gomony, Federico Corradi, Bernabe Linares-Barranco, Manolis Sifalakis,
- Abstract summary: We introduce Shared Circular Delay Queue (SCDQ), a novel hardware structure for supporting synaptic delays on digital neuromorphic accelerators.
Our analysis and hardware results show that it scales better in terms of memory, than current commonly used approaches, and is more amortizable to algorithm- hardware co-optimizations.
- Score: 1.260842513389711
- License:
- Abstract: Synaptic delay parameterization of neural network models have remained largely unexplored but recent literature has been showing promising results, suggesting the delay parameterized models are simpler, smaller, sparser, and thus more energy efficient than similar performing (e.g. task accuracy) non-delay parameterized ones. We introduce Shared Circular Delay Queue (SCDQ), a novel hardware structure for supporting synaptic delays on digital neuromorphic accelerators. Our analysis and hardware results show that it scales better in terms of memory, than current commonly used approaches, and is more amortizable to algorithm-hardware co-optimizations, where in fact, memory scaling is modulated by model sparsity and not merely network size. Next to memory we also report performance on latency area and energy per inference.
Related papers
- Reduced Order Modeling with Shallow Recurrent Decoder Networks [5.686433280542813]
SHRED-ROM is a robust decoding-only strategy that encodes the numerically unstable approximation of an inverse.
We show that SHRED-ROM accurately reconstructs the state dynamics for new parameter values starting from limited fixed or mobile sensors.
arXiv Detail & Related papers (2025-02-15T23:41:31Z) - Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference.
Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms.
We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z) - Efficient Event-based Delay Learning in Spiking Neural Networks [0.1350479308585481]
Spiking Neural Networks (SNNs) are attracting increased attention as an energy-efficient alternative to traditional Neural Networks.
We propose a novel event-based training method for SNNs, grounded in the EventPropProp formalism.
We show that our approach uses less than half the memory of the current state-of-the-art delay-learning method and is up to 26x faster.
arXiv Detail & Related papers (2025-01-13T13:44:34Z) - DelGrad: Exact event-based gradients in spiking networks for training delays and weights [1.5226147562426895]
Spiking neural networks (SNNs) inherently rely on the timing of signals for representing and processing information.
We propose DelGrad, an event-based method to compute exact loss gradients for both synaptic weights and delays.
We experimentally demonstrate the memory efficiency and accuracy benefits of adding delays to SNNs on noisy mixed-signal hardware.
arXiv Detail & Related papers (2024-04-30T00:02:34Z) - Hardware-aware training of models with synaptic delays for digital event-driven neuromorphic processors [1.3415700412919966]
We propose a framework to train and deploy, in digital neuromorphic hardware, highly performing spiking neural network models (SNNs)
The training accounts for both platform constraints, such as synaptic weight precision and the total number of parameters per core, as a function of the network size.
We evaluate trained models in two neuromorphic digital hardware platforms: Intel Loihi and Imec Seneca.
arXiv Detail & Related papers (2024-04-16T14:22:58Z) - Accelerating Scalable Graph Neural Network Inference with Node-Adaptive
Propagation [80.227864832092]
Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications.
The sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs.
We propose an online propagation framework and two novel node-adaptive propagation methods.
arXiv Detail & Related papers (2023-10-17T05:03:00Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z) - Toward fast and accurate human pose estimation via soft-gated skip
connections [97.06882200076096]
This paper is on highly accurate and highly efficient human pose estimation.
We re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art.
Our model achieves state-of-the-art results on the MPII and LSP datasets.
arXiv Detail & Related papers (2020-02-25T18:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.