Related papers: Memory-Free and Parallel Computation for Quantized Spiking Neural Networks

Memory-Free and Parallel Computation for Quantized Spiking Neural Networks

URL: http://arxiv.org/abs/2503.00040v1
Date: Tue, 25 Feb 2025 10:34:25 GMT
Title: Memory-Free and Parallel Computation for Quantized Spiking Neural Networks
Authors: Dehao Zhang, Shuai Wang, Yichen Xiao, Wenjie Wei, Yimeng Shan, Malu Zhang, Yang Yang,
Abstract summary: Quantized Spiking Neural Networks (QSNNs) offer superior energy efficiency and are well-suited for deployment on resource-limited edge devices.<n> limited bit-width weight and membrane potential result in a notable performance decline.<n>We introduce a memory-free quantization method that captures all historical information without directly storing membrane potentials.
Score: 12.227968342252026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantized Spiking Neural Networks (QSNNs) offer superior energy efficiency and are well-suited for deployment on resource-limited edge devices. However, limited bit-width weight and membrane potential result in a notable performance decline. In this study, we first identify a new underlying cause for this decline: the loss of historical information due to the quantized membrane potential. To tackle this issue, we introduce a memory-free quantization method that captures all historical information without directly storing membrane potentials, resulting in better performance with less memory requirements. To further improve the computational efficiency, we propose a parallel training and asynchronous inference framework that greatly increases training speed and energy efficiency. We combine the proposed memory-free quantization and parallel computation methods to develop a high-performance and efficient QSNN, named MFP-QSNN. Extensive experiments show that our MFP-QSNN achieves state-of-the-art performance on various static and neuromorphic image datasets, requiring less memory and faster training speeds. The efficiency and efficacy of the MFP-QSNN highlight its potential for energy-efficient neuromorphic computing.

Related papers

Efficient Deployment of CNN Models on Multiple In-Memory Computing Units [0.0]
In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration.<n>We introduce the Load-Balance-Longest-Path (LBLP) algorithm for maximizing the processing rate and minimizing latency due to efficient resources utilization.
arXiv Detail & Related papers (2025-10-09T14:03:32Z)
Time-independent Spiking Neuron via Membrane Potential Estimation for Efficient Spiking Neural Networks [4.142699381024752]
computational inefficiency of spiking neural networks (SNNs) is primarily due to the sequential updates of membrane potential.<n>We propose Membrane Potential Estimation Parallel Spiking Neurons (MPE-PSN), a parallel computation method for spiking neurons.<n>Our approach exhibits promise for enhancing computational efficiency, particularly under conditions of elevated neuron density.
arXiv Detail & Related papers (2024-09-08T05:14:22Z)
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration [5.0389804644646174]
We introduce OPIMA, a processing-in-memory (PIM)-based machine learning accelerator. PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks. We show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
arXiv Detail & Related papers (2024-07-11T06:12:04Z)
Q-SNNs: Quantized Spiking Neural Networks [12.719590949933105]
Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an event-driven manner.<n>We introduce a lightweight and hardware-friendly Quantized SNN that applies quantization to both synaptic weights and membrane potentials.<n>We present a new Weight-Spike Dual Regulation (WS-DR) method inspired by information entropy theory.
arXiv Detail & Related papers (2024-06-19T16:23:26Z)
Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z)
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient. We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z)
Full-Stack Optimization for CAM-Only DNN Inference [2.0837295518447934]
This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors. We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. Our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators.
arXiv Detail & Related papers (2024-01-23T10:27:38Z)
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality. On the software side, we evaluate epitomes' latency and energy on PIM accelerators. We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks [9.585985556876537]
Non-linear activation of Leaky-Integrate-and-Fire (LIF) neurons requires additional memory to store a membrane voltage to capture the temporal dynamics of spikes. We propose a simple and effective solution, EfficientLIF-Net, which shares the LIF neurons across different layers and channels. Our EfficientLIF-Net achieves comparable accuracy with the standard SNNs while bringing up to 4.3X forward memory efficiency and 21.9X backward memory efficiency for LIF neurons.
arXiv Detail & Related papers (2023-05-26T22:55:26Z)
MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
FSpiNN: An Optimization Framework for Memory- and Energy-Efficient Spiking Neural Networks [14.916996986290902]
Spiking Neural Networks (SNNs) offer unsupervised learning capability due to the spike-timing-dependent plasticity (STDP) rule. However, state-of-the-art SNNs require a large memory footprint to achieve high accuracy. We propose FSpiNN, an optimization framework for obtaining memory- and energy-efficient SNNs for training and inference processing.
arXiv Detail & Related papers (2020-07-17T09:40:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.