Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
- URL: http://arxiv.org/abs/2503.00226v2
- Date: Mon, 17 Mar 2025 03:17:00 GMT
- Title: Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
- Authors: Yufei Guo, Xiaode Liu, Yuanpei Chen, Weihang Peng, Yuhan Zhang, Zhe Ma,
- Abstract summary: Spiking Neural Networks have emerged as a promising energy-efficient alternative to traditional Artificial Neural Networks.<n>This paper introduces Accurate Addition-Only Spiking Self-Attention (A$2$OS$2$A)
- Score: 15.93436166506258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have demonstrated outstanding performance across a wide range of tasks, owing to their self-attention mechanism, but they are highly energy-consuming. Spiking Neural Networks have emerged as a promising energy-efficient alternative to traditional Artificial Neural Networks, leveraging event-driven computation and binary spikes for information transfer. The combination of Transformers' capabilities with the energy efficiency of SNNs offers a compelling opportunity. This paper addresses the challenge of adapting the self-attention mechanism of Transformers to the spiking paradigm by introducing a novel approach: Accurate Addition-Only Spiking Self-Attention (A$^2$OS$^2$A). Unlike existing methods that rely solely on binary spiking neurons for all components of the self-attention mechanism, our approach integrates binary, ReLU, and ternary spiking neurons. This hybrid strategy significantly improves accuracy while preserving non-multiplicative computations. Moreover, our method eliminates the need for softmax and scaling operations. Extensive experiments show that the A$^2$OS$^2$A-based Spiking Transformer outperforms existing SNN-based Transformers on several datasets, even achieving an accuracy of 78.66\% on ImageNet-1K. Our work represents a significant advancement in SNN-based Transformer models, offering a more accurate and efficient solution for real-world applications.
Related papers
- BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN)<n>We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations.<n>Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z) - Towards High-performance Spiking Transformers from ANN to SNN Conversion [43.53538629484375]
Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness.<n>Current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SNNs.<n>In this paper, we propose an Expectation Compensation Module to preserve accuracy of the conversion.
arXiv Detail & Related papers (2025-02-28T16:12:37Z) - Binary Event-Driven Spiking Transformer [36.815359983551986]
Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm.<n>We propose the Binary Event-Driven Spiking Transformer, i.e. BESTformer.<n> BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization.
arXiv Detail & Related papers (2025-01-10T12:00:11Z) - Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks [44.145870290310356]
Spiking Neural Networks have attracted significant attention in recent years due to their distinctive low-power characteristics.<n>Transformers models, known for their powerful self-attention mechanisms and parallel processing capabilities, have demonstrated exceptional performance across various domains.<n>Despite the significant advantages of both SNNs and Transformers, directly combining the low-power benefits of SNNs with the high performance of Transformers remains challenging.
arXiv Detail & Related papers (2024-12-18T07:07:38Z) - CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference [0.30104001512119216]
Building models with fast and energy-efficient inference is imperative to enable a variety of transformer-based applications.
We build on an approach for learning LUT networks directly via an Extended Finite Difference method.
This allows for a computational and energy-efficient inference solution for transformer-based models.
arXiv Detail & Related papers (2024-11-04T05:38:56Z) - SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks [22.665939536001797]
We propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method.
Based on DSSA, we propose a novel spiking Vision Transformer architecture called SpikingResformer.
We show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking Vision Transformer counterparts.
arXiv Detail & Related papers (2024-03-21T11:16:42Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis.
Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.