Spiking Wavelet Transformer
- URL: http://arxiv.org/abs/2403.11138v5
- Date: Wed, 4 Sep 2024 08:57:24 GMT
- Title: Spiking Wavelet Transformer
- Authors: Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu,
- Abstract summary: Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning.
Transformers with SNNs have shown promise for accuracy, but struggle to learn high-frequency patterns.
We propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner.
- Score: 1.8712213089437697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer's effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.
Related papers
- Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods.
Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z) - Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning [30.51005522218133]
We introduce a novel Spiking Tucker Fusion Transformer (STFT) for audio-visual zero-shot learning (ZSL)
The STFT leverage the temporal and semantic information from different time steps to generate robust representations.
We propose a global-local pooling (GLP) which combines the max and average pooling operations.
arXiv Detail & Related papers (2024-07-11T02:01:26Z) - Attention-free Spikformer: Mixing Spike Sequences with Simple Linear
Transforms [16.54314950692779]
Spikformer integrates self-attention capability and the biological properties of Spiking Neural Networks (SNNs)
It introduces a Spiking Self-Attention (SSA) module to mix sparse visual features using spike-form Query, Key, and Value.
We conduct extensive experiments on image classification using both neuromorphic and static datasets.
arXiv Detail & Related papers (2023-08-02T11:41:54Z) - WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence
Learning Ability [31.791279777902957]
Recent works show that learning attention in the Fourier space can improve the long sequence learning capability of Transformers.
We argue that wavelet transform shall be a better choice because it captures both position and frequency information with linear time complexity.
We propose Wavelet Space Attention (WavSpA) that facilitates attention learning in a learnable wavelet coefficient space.
arXiv Detail & Related papers (2022-10-05T02:37:59Z) - Learning Spatial-Frequency Transformer for Visual Object Tracking [15.750739748843744]
Recent trackers adopt the Transformer to combine or replace the widely used ResNet as their new backbone network.
We believe these operations ignore the spatial prior of the target object which may lead to sub-optimal results.
We propose a unified Spatial-Frequency Transformer that models the spatial Prior and High-frequency emphasis Attention (GPHA) simultaneously.
arXiv Detail & Related papers (2022-08-18T13:46:12Z) - Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
Learning [138.29273453811945]
Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks.
We propose a new Wavelet Vision Transformer (textbfWave-ViT) that formulates the invertible down-sampling with wavelet transforms and self-attention learning.
arXiv Detail & Related papers (2022-07-11T16:03:51Z) - Inception Transformer [151.939077819196]
Inception Transformer, or iFormer, learns comprehensive features with both high- and low-frequency information in visual data.
We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation.
arXiv Detail & Related papers (2022-05-25T17:59:54Z) - Trainable Wavelet Neural Network for Non-Stationary Signals [0.0]
This work introduces a wavelet neural network to learn a filter-bank specialized to fit non-stationary signals and improve interpretability and performance for digital signal processing.
The network uses a wavelet transform as the first layer of a neural network where the convolution is a parameterized function of the complex Morlet wavelet.
arXiv Detail & Related papers (2022-05-06T16:41:27Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Wavelet Integrated CNNs for Noise-Robust Image Classification [51.18193090255933]
We enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT)
WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
arXiv Detail & Related papers (2020-05-07T09:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.