Related papers: SpikePool: Event-driven Spiking Transformer with Pooling Attention

SpikePool: Event-driven Spiking Transformer with Pooling Attention

URL: http://arxiv.org/abs/2510.12102v1
Date: Tue, 14 Oct 2025 03:08:49 GMT
Title: SpikePool: Event-driven Spiking Transformer with Pooling Attention
Authors: Donghyun Lee, Alex Sima, Yuhang Li, Panos Stinis, Priyadarshini Panda,
Abstract summary: Spiking Neural Networks (SNNs) have increasingly been integrated with transformer architectures.<n>Current approaches primarily focus on architectural modifications without analyzing the underlying signal processing characteristics.<n>We analyze spiking transformers through the frequency spectrum domain and discover that they behave as high-pass filters.<n>We propose SpikePool, which replaces spike-based self-attention with max pooling attention, a low-pass filtering operation, to create a selective band-pass filtering effect.
Score: 17.15887489143204
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Building on the success of transformers, Spiking Neural Networks (SNNs) have increasingly been integrated with transformer architectures, leading to spiking transformers that demonstrate promising performance on event-based vision tasks. However, despite these empirical successes, there remains limited understanding of how spiking transformers fundamentally process event-based data. Current approaches primarily focus on architectural modifications without analyzing the underlying signal processing characteristics. In this work, we analyze spiking transformers through the frequency spectrum domain and discover that they behave as high-pass filters, contrasting with Vision Transformers (ViTs) that act as low-pass filters. This frequency domain analysis reveals why certain designs work well for event-based data, which contains valuable high-frequency information but is also sparse and noisy. Based on this observation, we propose SpikePool, which replaces spike-based self-attention with max pooling attention, a low-pass filtering operation, to create a selective band-pass filtering effect. This design preserves meaningful high-frequency content while capturing critical features and suppressing noise, achieving a better balance for event-based data processing. Our approach demonstrates competitive results on event-based datasets for both classification and object detection tasks while significantly reducing training and inference time by up to 42.5% and 32.8%, respectively.

Related papers

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking [51.31378940976401]
Existing RGB-Event tracking approaches fail to fully exploit the unique advantages of event cameras.<n>We propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality.<n>Experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method.
arXiv Detail & Related papers (2026-01-03T01:10:17Z)
Frequency-Dynamic Attention Modulation for Dense Prediction [14.066404173580864]
We propose a circuit-theory-inspired strategy called Frequency-Dynamic Attention Modulation (FDAM)<n>FDAM directly modulates the overall frequency response of Vision Transformers (ViTs)
arXiv Detail & Related papers (2025-07-16T07:59:54Z)
Self-Bootstrapping for Versatile Test-Time Adaptation [29.616417768209114]
We develop a versatile test-time adaptation (TTA) objective for a variety of tasks.<n>We achieve this through a self-bootstrapping scheme that optimize prediction consistency between the test image (as target) and its deteriorated view.<n> Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks.
arXiv Detail & Related papers (2025-04-10T05:45:07Z)
Frequency Domain Enhanced U-Net for Low-Frequency Information-Rich Image Segmentation in Surgical and Deep-Sea Exploration Robots [34.28684917337352]
We address the differences in frequency band sensitivity between CNNs and the human visual system.<n>We propose a wavelet adaptive spectrum fusion (WASF) method inspired by biological vision mechanisms to balance cross-frequency image features.<n>We develop the FE-UNet model, which employs a SAM2 backbone network and incorporates fine-tuned Hiera-Large modules to ensure segmentation accuracy.
arXiv Detail & Related papers (2025-02-06T07:24:34Z)
Spiking Wavelet Transformer [1.8712213089437697]
Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning. Transformers with SNNs have shown promise for accuracy, but struggle to learn high-frequency patterns. We propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner.
arXiv Detail & Related papers (2024-03-17T08:41:48Z)
Enhancing Traffic Prediction with Learnable Filter Module [42.44466196331814]
Noise in traffic data can be challenging to model due to its nature and can lead to overfitting risks. We propose a learnable filter module to filter out noise in traffic data adaptively. We demonstrate that the proposed module is lightweight, easy to integrate with existing models, and can significantly improve traffic prediction performance.
arXiv Detail & Related papers (2023-10-24T09:16:13Z)
Learning Spatial-Frequency Transformer for Visual Object Tracking [15.750739748843744]
Recent trackers adopt the Transformer to combine or replace the widely used ResNet as their new backbone network. We believe these operations ignore the spatial prior of the target object which may lead to sub-optimal results. We propose a unified Spatial-Frequency Transformer that models the spatial Prior and High-frequency emphasis Attention (GPHA) simultaneously.
arXiv Detail & Related papers (2022-08-18T13:46:12Z)
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning [138.29273453811945]
Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks. We propose a new Wavelet Vision Transformer (textbfWave-ViT) that formulates the invertible down-sampling with wavelet transforms and self-attention learning.
arXiv Detail & Related papers (2022-07-11T16:03:51Z)
Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy. Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification. We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z)
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain. Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z)
Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets. We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z)
WaveTransform: Crafting Adversarial Examples via Input Decomposition [69.01794414018603]
We introduce WaveTransform', that creates adversarial noise corresponding to low-frequency and high-frequency subbands, separately (or in combination) Experiments show that the proposed attack is effective against the defense algorithm and is also transferable across CNNs.
arXiv Detail & Related papers (2020-10-29T17:16:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.