SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
- URL: http://arxiv.org/abs/2503.15986v2
- Date: Wed, 23 Jul 2025 08:42:32 GMT
- Title: SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
- Authors: Zeqi Zheng, Yanchen Huang, Yingchao Yu, Zizheng Zhu, Junfeng Tang, Zhaofei Yu, Yaochu Jin,
- Abstract summary: Spiking Neural Networks (SNNs) based on Transformers have garnered significant attention due to their superior performance and high energy efficiency.<n>We propose a Lateral Inhibition-inspired Spiking Transformer (SpiLiFormer) to address the issue of over-allocating attention to irrelevant contexts.<n>SpiLiFormer emulates the brain's lateral inhibition mechanism, guiding the model to enhance attention to relevant tokens while suppressing attention to irrelevant ones.
- Score: 29.724968607408048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spiking Neural Networks (SNNs) based on Transformers have garnered significant attention due to their superior performance and high energy efficiency. However, the spiking attention modules of most existing Transformer-based SNNs are adapted from those of analog Transformers, failing to fully address the issue of over-allocating attention to irrelevant contexts. To fix this fundamental yet overlooked issue, we propose a Lateral Inhibition-inspired Spiking Transformer (SpiLiFormer). It emulates the brain's lateral inhibition mechanism, guiding the model to enhance attention to relevant tokens while suppressing attention to irrelevant ones. Our model achieves state-of-the-art (SOTA) performance across multiple datasets, including CIFAR-10 (+0.45%), CIFAR-100 (+0.48%), CIFAR10-DVS (+2.70%), N-Caltech101 (+1.94%), and ImageNet-1K (+1.6%). Notably, on the ImageNet-1K dataset, SpiLiFormer (69.9M parameters, 4 time steps, 384 resolution) outperforms E-SpikeFormer (173.0M parameters, 8 time steps, 384 resolution), a SOTA spiking Transformer, by 0.46% using only 39% of the parameters and half the time steps. The code and model checkpoints are publicly available at https://github.com/KirinZheng/SpiLiFormer.
Related papers
- Diversity-Guided MLP Reduction for Efficient Large Vision Transformers [54.656502058570226]
Transformer models achieve excellent scaling property, where the performance is improved with the increment of model capacity.<n>Large-scale model parameters lead to an unaffordable cost of computing and memory.<n>We propose a Diversity-Guided Reduction (DGMR) method to significantly reduce the parameters of large vision transformers.
arXiv Detail & Related papers (2025-06-10T08:59:27Z) - Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer [15.93436166506258]
Spiking Neural Networks have emerged as a promising energy-efficient alternative to traditional Artificial Neural Networks.
This paper introduces Accurate Addition-Only Spiking Self-Attention (A$2$OS$2$A)
arXiv Detail & Related papers (2025-02-28T22:23:29Z) - SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models.
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z) - Optimizing the Deployment of Tiny Transformers on Low-Power MCUs [12.905978154498499]
This work aims to enable and optimize the flexible, multi-platform deployment of encoder Tiny Transformers on commercial MCUs.
Our framework provides an optimized library of kernels to maximize data reuse and avoid data marshaling operations into the crucial attention block.
We show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6.19x, while the fused-weight attention can reduce the runtime by 1.53x, and number of parameters by 25%.
arXiv Detail & Related papers (2024-04-03T14:14:08Z) - QKFormer: Hierarchical Spiking Transformer using Q-K Attention [39.55446999753786]
Spiking Transformers integrate Spiking Neural Networks (SNNs) with Transformer architectures.
We introduce several innovations to improve the performance of existing models.
We develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training.
arXiv Detail & Related papers (2024-03-25T08:57:27Z) - Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation [73.31524865643709]
We present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D pose estimation from videos.
Our HoDT begins with pruning pose tokens of redundant frames and ends with recovering full-length tokens, resulting in a few pose tokens in the intermediate transformer blocks.
Our method can achieve both high efficiency and estimation accuracy compared to the original VPT models.
arXiv Detail & Related papers (2023-11-20T18:59:51Z) - SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in
Spiking Transformer [12.717450255837178]
Spiking Neural Network (SNN) has the advantages of low power consumption and high energy efficiency.
The most advanced SNN, Spikformer, combines the self-attention module from Transformer with SNN to achieve remarkable performance.
We present SparseSpikformer, a co-design framework aimed at achieving sparsity in Spikformer through token and weight pruning techniques.
arXiv Detail & Related papers (2023-11-15T09:22:52Z) - Spikingformer: Spike-driven Residual Learning for Transformer-based
Spiking Neural Network [19.932683405796126]
Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks.
SNNs suffer from non-spike computations caused by the structure of their residual connection.
We develop Spikingformer, a pure transformer-based spiking neural network.
arXiv Detail & Related papers (2023-04-24T09:44:24Z) - Fcaformer: Forward Cross Attention in Hybrid Vision Transformer [29.09883780571206]
We propose forward cross attention for hybrid vision transformer (FcaFormer)
Our FcaFormer achieves 83.1% top-1 accuracy on Imagenet with only 16.3 million parameters and about 3.6 billion MACs.
This saves almost half of the parameters and a few computational costs while achieving 0.7% higher accuracy compared to distilled EfficientFormer.
arXiv Detail & Related papers (2022-11-14T08:43:44Z) - Spikformer: When Spiking Neural Network Meets Transformer [102.91330530210037]
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism.
We propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer)
arXiv Detail & Related papers (2022-09-29T14:16:49Z) - Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z) - CMT: Convolutional Neural Networks Meet Vision Transformers [68.10025999594883]
Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.
There are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs)
We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features.
In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively.
arXiv Detail & Related papers (2021-07-13T17:47:19Z) - Conformer: Convolution-augmented Transformer for Speech Recognition [60.119604551507805]
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR)
We propose the convolution-augmented transformer for speech recognition, named Conformer.
On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother.
arXiv Detail & Related papers (2020-05-16T20:56:25Z) - End-to-End Multi-speaker Speech Recognition with Transformer [88.22355110349933]
We replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture.
We also modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation.
arXiv Detail & Related papers (2020-02-10T16:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.