SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in
Spiking Transformer
- URL: http://arxiv.org/abs/2311.08806v1
- Date: Wed, 15 Nov 2023 09:22:52 GMT
- Title: SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in
Spiking Transformer
- Authors: Yue Liu, Shanlin Xiao, Bo Li, Zhiyi Yu
- Abstract summary: Spiking Neural Network (SNN) has the advantages of low power consumption and high energy efficiency.
The most advanced SNN, Spikformer, combines the self-attention module from Transformer with SNN to achieve remarkable performance.
We present SparseSpikformer, a co-design framework aimed at achieving sparsity in Spikformer through token and weight pruning techniques.
- Score: 12.717450255837178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the third-generation neural network, the Spiking Neural Network (SNN) has
the advantages of low power consumption and high energy efficiency, making it
suitable for implementation on edge devices. More recently, the most advanced
SNN, Spikformer, combines the self-attention module from Transformer with SNN
to achieve remarkable performance. However, it adopts larger channel dimensions
in MLP layers, leading to an increased number of redundant model parameters. To
effectively decrease the computational complexity and weight parameters of the
model, we explore the Lottery Ticket Hypothesis (LTH) and discover a very
sparse ($\ge$90%) subnetwork that achieves comparable performance to the
original network. Furthermore, we also design a lightweight token selector
module, which can remove unimportant background information from images based
on the average spike firing rate of neurons, selecting only essential
foreground image tokens to participate in attention calculation. Based on that,
we present SparseSpikformer, a co-design framework aimed at achieving sparsity
in Spikformer through token and weight pruning techniques. Experimental results
demonstrate that our framework can significantly reduce 90% model parameters
and cut down Giga Floating-Point Operations (GFLOPs) by 20% while maintaining
the accuracy of the original model.
Related papers
- MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning [7.262751938473306]
Pruning is a well-established technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation.
We develop a new pruning algorithm, MPruner, that leverages mutual information through vector similarity.
MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.
arXiv Detail & Related papers (2024-08-24T05:54:47Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Auto-Spikformer: Spikformer Architecture Search [22.332981906087785]
Self-attention mechanisms have been integrated into Spiking Neural Networks (SNNs)
Recent advancements in SNN architecture, such as Spikformer, have demonstrated promising outcomes.
We propose Auto-Spikformer, a one-shot Transformer Architecture Search (TAS) method, which automates the quest for an optimized Spikformer architecture.
arXiv Detail & Related papers (2023-06-01T15:35:26Z) - Spikingformer: Spike-driven Residual Learning for Transformer-based
Spiking Neural Network [19.932683405796126]
Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks.
SNNs suffer from non-spike computations caused by the structure of their residual connection.
We develop Spikingformer, a pure transformer-based spiking neural network.
arXiv Detail & Related papers (2023-04-24T09:44:24Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Spikformer: When Spiking Neural Network Meets Transformer [102.91330530210037]
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism.
We propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer)
arXiv Detail & Related papers (2022-09-29T14:16:49Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - A Battle of Network Structures: An Empirical Study of CNN, Transformer,
and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends.
In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.