Attention-free Spikformer: Mixing Spike Sequences with Simple Linear
Transforms
- URL: http://arxiv.org/abs/2308.02557v2
- Date: Thu, 17 Aug 2023 06:12:24 GMT
- Title: Attention-free Spikformer: Mixing Spike Sequences with Simple Linear
Transforms
- Authors: Qingyu Wang, Duzhen Zhang, Tielin Zhang, Bo Xu
- Abstract summary: Spikformer integrates self-attention capability and the biological properties of Spiking Neural Networks (SNNs)
It introduces a Spiking Self-Attention (SSA) module to mix sparse visual features using spike-form Query, Key, and Value.
We conduct extensive experiments on image classification using both neuromorphic and static datasets.
- Score: 16.54314950692779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By integrating the self-attention capability and the biological properties of
Spiking Neural Networks (SNNs), Spikformer applies the flourishing Transformer
architecture to SNNs design. It introduces a Spiking Self-Attention (SSA)
module to mix sparse visual features using spike-form Query, Key, and Value,
resulting in the State-Of-The-Art (SOTA) performance on numerous datasets
compared to previous SNN-like frameworks. In this paper, we demonstrate that
the Spikformer architecture can be accelerated by replacing the SSA with an
unparameterized Linear Transform (LT) such as Fourier and Wavelet transforms.
These transforms are utilized to mix spike sequences, reducing the quadratic
time complexity to log-linear time complexity. They alternate between the
frequency and time domains to extract sparse visual features, showcasing
powerful performance and efficiency. We conduct extensive experiments on image
classification using both neuromorphic and static datasets. The results
indicate that compared to the SOTA Spikformer with SSA, Spikformer with LT
achieves higher Top-1 accuracy on neuromorphic datasets (i.e., CIFAR10-DVS and
DVS128 Gesture) and comparable Top-1 accuracy on static datasets (i.e.,
CIFAR-10 and CIFAR-100). Furthermore, Spikformer with LT achieves approximately
29-51% improvement in training speed, 61-70% improvement in inference speed,
and reduces memory usage by 4-26% due to not requiring learnable parameters.
Related papers
- Spiking Transformer with Spatial-Temporal Attention [26.7175155847563]
Spiking Neural Networks (SNNs) present a compelling and energy-efficient alternative to traditional Artificial Neural Networks (ANNs)
We introduce Spiking Transformer with Spatial-Temporal Attention (STAtten), a simple and straightforward architecture designed to integrate spatial and temporal information in self-attention.
We first verify our spatial-temporal attention mechanism's ability to capture long-term temporal dependencies using sequential datasets.
arXiv Detail & Related papers (2024-09-29T20:29:39Z) - Spiking Wavelet Transformer [1.8712213089437697]
Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning.
Transformers with SNNs have shown promise for accuracy, but struggle to learn high-frequency patterns.
We propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner.
arXiv Detail & Related papers (2024-03-17T08:41:48Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Spikformer: When Spiking Neural Network Meets Transformer [102.91330530210037]
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism.
We propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer)
arXiv Detail & Related papers (2022-09-29T14:16:49Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - Improving STDP-based Visual Feature Learning with Whitening [1.9981375888949475]
In this paper, we propose to use whitening as a pre-processing step before learning features with STDP.
Experiments on CIFAR-10 show that whitening allows STDP to learn visual features that are closer to the ones learned with standard neural networks.
We also propose an approximation of whitening as convolution kernels that is computationally cheaper to learn and more suited to be implemented on neuromorphic hardware.
arXiv Detail & Related papers (2020-02-24T11:48:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.