Spiking Vision Transformer with Saccadic Attention
- URL: http://arxiv.org/abs/2502.12677v1
- Date: Tue, 18 Feb 2025 09:32:29 GMT
- Title: Spiking Vision Transformer with Saccadic Attention
- Authors: Shuai Wang, Malu Zhang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Yu Liang, Yimeng Shan, Qian Sun, Enqi Zhang, Yang Yang,
- Abstract summary: Spiking Neural Networks (SNNs) and Vision Transformers (NNTs) hold potential for achieving both energy efficiency and high performance.
We first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between vanilla selfattention mechanism and temporal spike trains.
We introduce an innovative Saccadic Spike Self-Attention (SSSA) method to address these issues.
Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT)
- Score: 14.447083381772375
- License:
- Abstract: The combination of Spiking Neural Networks (SNNs) and Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.
Related papers
- Pedestrian intention prediction in Adverse Weather Conditions with Spiking Neural Networks and Dynamic Vision Sensors [0.0699049312989311]
This study examines the effectiveness of Spiking Neural Networks (SNNs) paired with Dynamic Vision Sensors (DVS) to improve pedestrian detection in adverse weather.
We assess the efficiency of SNNs compared to traditional Convolutional Neural Networks (CNNs)
The results indicate that SNNs, integrated with DVS, significantly reduce computational overhead and improve detection accuracy in challenging conditions compared to CNNs.
arXiv Detail & Related papers (2024-06-01T15:58:24Z) - Fully Spiking Actor Network with Intra-layer Connections for
Reinforcement Learning [51.386945803485084]
We focus on the task where the agent needs to learn multi-dimensional deterministic policies to control.
Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected layer.
To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects.
arXiv Detail & Related papers (2024-01-09T07:31:34Z) - Temporal Contrastive Learning for Spiking Neural Networks [23.963069990569714]
Biologically inspired neural networks (SNNs) have garnered considerable attention due to their low-energy consumption and better-temporal information processing capabilities.
We propose a novel method to obtain SNNs with low latency and high performance by incorporating contrastive supervision with temporal domain information.
arXiv Detail & Related papers (2023-05-23T10:31:46Z) - Spikformer: When Spiking Neural Network Meets Transformer [102.91330530210037]
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism.
We propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer)
arXiv Detail & Related papers (2022-09-29T14:16:49Z) - A Spatial-channel-temporal-fused Attention for Spiking Neural Networks [7.759491656618468]
Spiking neural networks (SNNs) mimic computational strategies, and exhibit substantial capabilities in processing information.
We propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions.
arXiv Detail & Related papers (2022-09-22T07:45:55Z) - On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data.
There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations.
This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z) - TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks [22.965024490694525]
Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency and powerful-temporal information representation ability.
We present a Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN.
The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions.
arXiv Detail & Related papers (2022-06-21T08:16:08Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for
Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames.
Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks.
We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z) - Spiking Neural Networks for Visual Place Recognition via Weighted
Neuronal Assignments [24.754429120321365]
Spiking neural networks (SNNs) offer both compelling potential advantages, including energy efficiency and low latencies.
One promising area for high performance SNNs is template matching and image recognition.
This research introduces the first high performance SNN for the Visual Place Recognition (VPR) task.
arXiv Detail & Related papers (2021-09-14T05:40:40Z) - A journey in ESN and LSTM visualisations on a language task [77.34726150561087]
We trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task.
The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space.
arXiv Detail & Related papers (2020-12-03T08:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.