Related papers: QKFormer: Hierarchical Spiking Transformer using Q-K Attention

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

URL: http://arxiv.org/abs/2403.16552v1
Date: Mon, 25 Mar 2024 08:57:27 GMT
Title: QKFormer: Hierarchical Spiking Transformer using Q-K Attention
Authors: Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian,
Abstract summary: Spiking Transformers integrate Spiking Neural Networks (SNNs) with Transformer architectures. We introduce several innovations to improve the performance of existing models. We develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training.
Score: 39.55446999753786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

Related papers

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition [29.724968607408048]
Spiking Neural Networks (SNNs) based on Transformers have garnered significant attention due to their superior performance and high energy efficiency.<n>We propose a Lateral Inhibition-inspired Spiking Transformer (SpiLiFormer) to address the issue of over-allocating attention to irrelevant contexts.<n>SpiLiFormer emulates the brain's lateral inhibition mechanism, guiding the model to enhance attention to relevant tokens while suppressing attention to irrelevant ones.
arXiv Detail & Related papers (2025-03-20T09:36:31Z)
Towards High-performance Spiking Transformers from ANN to SNN Conversion [43.53538629484375]
Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness. Current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SNNs. In this paper, we propose an Expectation Compensation Module to preserve accuracy of the conversion.
arXiv Detail & Related papers (2025-02-28T16:12:37Z)
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training [17.193023656793464]
The ambition of brain-inspired Spiking Neural Networks (SNNs) is to become a low-power alternative to traditional Artificial Neural Networks (ANNs) This work addresses two major challenges in realizing this vision: the performance gap between SNNs and ANNs, and the high training costs of SNNs. We identify intrinsic flaws in spiking neurons caused by binary firing mechanisms and propose a Spike Firing Approximation (SFA) method using integer training and spike-driven inference.
arXiv Detail & Related papers (2024-11-25T03:05:41Z)
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z)
Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips [37.305308839310136]
Neuromorphic computing exploits Spiking Neural Networks (SNNs) on neuromorphic chips. CNN-based SNNs are the current mainstream of neuromorphic computing. No neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged.
arXiv Detail & Related papers (2024-02-15T13:26:18Z)
SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer [12.717450255837178]
Spiking Neural Network (SNN) has the advantages of low power consumption and high energy efficiency. The most advanced SNN, Spikformer, combines the self-attention module from Transformer with SNN to achieve remarkable performance. We present SparseSpikformer, a co-design framework aimed at achieving sparsity in Spikformer through token and weight pruning techniques.
arXiv Detail & Related papers (2023-11-15T09:22:52Z)
Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference [13.924924047051782]
Deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. This research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs) We propose an innovative spiking architecture that uses batch normalization to retain MFI compatibility. We establish an efficient multi-stage spiking network that blends effectively global receptive fields with local feature extraction.
arXiv Detail & Related papers (2023-06-21T16:52:20Z)
Spikformer: When Spiking Neural Network Meets Transformer [102.91330530210037]
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism. We propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer)
arXiv Detail & Related papers (2022-09-29T14:16:49Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z)
CMT: Convolutional Neural Networks Meet Vision Transformers [68.10025999594883]
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. There are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs) We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features. In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively.
arXiv Detail & Related papers (2021-07-13T17:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.