Spike-driven Transformer
- URL: http://arxiv.org/abs/2307.01694v1
- Date: Tue, 4 Jul 2023 13:00:18 GMT
- Title: Spike-driven Transformer
- Authors: Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi
Li
- Abstract summary: Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm.
In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties.
It is shown that the Spike-driven Transformer can achieve 77.1% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field.
- Score: 31.931401322707995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spiking Neural Networks (SNNs) provide an energy-efficient deep learning
option due to their unique spike-based event-driven (i.e., spike-driven)
paradigm. In this paper, we incorporate the spike-driven paradigm into
Transformer by the proposed Spike-driven Transformer with four unique
properties: 1) Event-driven, no calculation is triggered when the input of
Transformer is zero; 2) Binary spike communication, all matrix multiplications
associated with the spike matrix can be transformed into sparse additions; 3)
Self-attention with linear complexity at both token and channel dimensions; 4)
The operations between spike-form Query, Key, and Value are mask and addition.
Together, there are only sparse addition operations in the Spike-driven
Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA),
which exploits only mask and addition operations without any multiplication,
and thus having up to $87.2\times$ lower computation energy than vanilla
self-attention. Especially in SDSA, the matrix multiplication between Query,
Key, and Value is designed as the mask operation. In addition, we rearrange all
residual connections in the vanilla Transformer before the activation functions
to ensure that all neurons transmit binary spike signals. It is shown that the
Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K,
which is the state-of-the-art result in the SNN field. The source code is
available at https://github.com/BICLab/Spike-Driven-Transformer.
Related papers
- Deep Transformers without Shortcuts: Modifying Self-attention for
Faithful Signal Propagation [105.22961467028234]
Skip connections and normalisation layers are ubiquitous for the training of Deep Neural Networks (DNNs)
Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them.
But these approaches are incompatible with the self-attention layers present in transformers.
arXiv Detail & Related papers (2023-02-20T21:26:25Z) - ByteTransformer: A High-Performance Transformer Boosted for
Variable-Length Inputs [6.9136984255301]
We present ByteTransformer, a high-performance transformer boosted for variable-length inputs.
ByteTransformer surpasses the state-of-the-art Transformer frameworks, such as PyTorch JIT, XLA, Tencent TurboTransformer and NVIDIA FasterTransformer.
arXiv Detail & Related papers (2022-10-06T16:57:23Z) - HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling [126.89573619301953]
We propose a new design of hierarchical vision transformers named HiViT (short for Hierarchical ViT)
HiViT enjoys both high efficiency and good performance in MIM.
In running MAE on ImageNet-1K, HiViT-B reports a +0.6% accuracy gain over ViT-B and a 1.9$times$ speed-up over Swin-B.
arXiv Detail & Related papers (2022-05-30T09:34:44Z) - Block-Recurrent Transformers [49.07682696216708]
We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence.
Our recurrent cell operates on blocks of tokens rather than single tokens, and leverages parallel computation within a block in order to make efficient use of accelerator hardware.
arXiv Detail & Related papers (2022-03-11T23:44:33Z) - Transformer with a Mixture of Gaussian Keys [31.91701434633319]
Multi-head attention is a driving force behind state-of-the-art transformers.
Transformer-MGK replaces redundant heads in transformers with a mixture of keys at each head.
Compared to its conventional transformer counterpart, Transformer-MGK accelerates training and inference, has fewer parameters, and requires less FLOPs to compute.
arXiv Detail & Related papers (2021-10-16T23:43:24Z) - Non-autoregressive Transformer with Unified Bidirectional Decoder for
Automatic Speech Recognition [20.93536420298548]
We propose a new non-autoregressive transformer with a unified decoder (NAT-UBD)
NAT-UBD can achieve character error rates (CERs) of 5.0%/5.5% on the Aishell1 dev/test sets, outperforming all previous NAR transformer models.
arXiv Detail & Related papers (2021-09-14T13:39:39Z) - Smart Bird: Learnable Sparse Attention for Efficient and Effective
Transformer [51.79399904527525]
We propose Smart Bird, which is an efficient and effective Transformer with learnable sparse attention.
In Smart Bird, we first compute a sketched attention matrix with a single-head low-dimensional Transformer.
We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads.
arXiv Detail & Related papers (2021-08-20T14:22:00Z) - Token Shift Transformer for Video Classification [34.05954523287077]
Transformer achieves remarkable successes in understanding 1 and 2-dimensional signals.
Its encoders naturally contain computational intensive operations such as pair-wise self-attention.
This paper presents Token Shift Module (i.e., TokShift) for modeling temporal relations within each transformer encoder.
arXiv Detail & Related papers (2021-08-05T08:04:54Z) - Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions.
When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.