Related papers: STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

URL: http://arxiv.org/abs/2505.11151v1
Date: Fri, 16 May 2025 11:50:14 GMT
Title: STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Authors: Sicheng Shen, Dongcheng Zhao, Linghao Feng, Zeyang Yue, Jindong Li, Tenglong Li, Guobin Shen, Yi Zeng,
Abstract summary: Spiking Transformers have emerged as promising architectures for combining the efficiency of spiking neural networks with the representational power of self-attention.<n>We present a unified benchmark framework for Spiking Transformers that supports a wide range of tasks, including classification, segmentation, and detection.<n>We propose a unified analytical model for energy estimation, accounting for spike sparsity, bitwidth, and memory access, and show that quantized ANNs may offer comparable or better energy efficiency.
Score: 5.660272448194108
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Spiking Transformers have recently emerged as promising architectures for combining the efficiency of spiking neural networks with the representational power of self-attention. However, the lack of standardized implementations, evaluation pipelines, and consistent design choices has hindered fair comparison and principled analysis. In this paper, we introduce \textbf{STEP}, a unified benchmark framework for Spiking Transformers that supports a wide range of tasks, including classification, segmentation, and detection across static, event-based, and sequential datasets. STEP provides modular support for diverse components such as spiking neurons, input encodings, surrogate gradients, and multiple backends (e.g., SpikingJelly, BrainCog). Using STEP, we reproduce and evaluate several representative models, and conduct systematic ablation studies on attention design, neuron types, encoding schemes, and temporal modeling capabilities. We also propose a unified analytical model for energy estimation, accounting for spike sparsity, bitwidth, and memory access, and show that quantized ANNs may offer comparable or better energy efficiency. Our results suggest that current Spiking Transformers rely heavily on convolutional frontends and lack strong temporal modeling, underscoring the need for spike-native architectural innovations. The full code is available at: https://github.com/Fancyssc/STEP

Related papers

Toward Efficient Spiking Transformers: Synapse Pruning Meets Synergistic Learning-Based Compensation [5.496016535669561]
We propose combining synapse pruning with a synergistic learning-based compensation strategy to derive lightweight Transformer-based models.<n>Experiments on benchmark datasets demonstrate that the proposed methods significantly reduce model size and computational overhead while maintaining competitive performance.
arXiv Detail & Related papers (2025-08-04T02:19:38Z)
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning [30.781578037476347]
We introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs)<n>Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index.<n>Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets.
arXiv Detail & Related papers (2025-03-03T09:12:14Z)
Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.<n>In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
We propose Cross Feature Pyramid Transformer decoder (CFPFormer), a novel decoder block that integrates feature pyramids and transformers.<n>Our work is capable of capturing long-range dependencies and effectively up-sample feature maps.<n>With a ResNet50 backbone, our method achieves 92.02% Dice Score, highlighting the efficacy of our methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z)
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers [10.566264033360282]
Post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs. In this paper, we propose a novel PTQ algorithm that balances accuracy and efficiency.
arXiv Detail & Related papers (2024-02-14T05:58:43Z)
Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization [6.799413002613627]
Todyformer is a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers. We show that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks.
arXiv Detail & Related papers (2024-02-02T23:05:30Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation. Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles. We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z)
Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers. We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images. Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD) It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.