Spiking Wavelet Transformer
- URL: http://arxiv.org/abs/2403.11138v3
- Date: Tue, 26 Mar 2024 02:40:04 GMT
- Title: Spiking Wavelet Transformer
- Authors: Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu,
- Abstract summary: Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by mimicking the event-driven processing of the brain.
It is incompetent to capture high-frequency patterns like moving edge and pixel-level brightness changes due to their reliance on global self-attention operations.
We propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner.
- Score: 1.8712213089437697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by mimicking the event-driven processing of the brain. Incorporating the Transformers with SNNs has shown promise for accuracy, yet it is incompetent to capture high-frequency patterns like moving edge and pixel-level brightness changes due to their reliance on global self-attention operations. Porting frequency representations in SNN is challenging yet crucial for event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation. We also adopt negative spike dynamics to strengthen the frequency representation further. This enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer's effectiveness in capturing spatial-frequency patterns in a multiplication-free, event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves an over 50% reduction in energy consumption, a 21.1% reduction in parameter count, and a 2.40% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers.
Related papers
- Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods.
Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z) - Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning [30.51005522218133]
We introduce a novel Spiking Tucker Fusion Transformer (STFT) for audio-visual zero-shot learning (ZSL)
The STFT leverage the temporal and semantic information from different time steps to generate robust representations.
We propose a global-local pooling (GLP) which combines the max and average pooling operations.
arXiv Detail & Related papers (2024-07-11T02:01:26Z) - Attention-free Spikformer: Mixing Spike Sequences with Simple Linear
Transforms [16.54314950692779]
Spikformer integrates self-attention capability and the biological properties of Spiking Neural Networks (SNNs)
It introduces a Spiking Self-Attention (SSA) module to mix sparse visual features using spike-form Query, Key, and Value.
We conduct extensive experiments on image classification using both neuromorphic and static datasets.
arXiv Detail & Related papers (2023-08-02T11:41:54Z) - Adaptive Frequency Filters As Efficient Global Token Mixers [100.27957692579892]
We show that adaptive frequency filters can serve as efficient global token mixers.
We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet.
arXiv Detail & Related papers (2023-07-26T07:42:28Z) - Learning Spatial-Frequency Transformer for Visual Object Tracking [15.750739748843744]
Recent trackers adopt the Transformer to combine or replace the widely used ResNet as their new backbone network.
We believe these operations ignore the spatial prior of the target object which may lead to sub-optimal results.
We propose a unified Spatial-Frequency Transformer that models the spatial Prior and High-frequency emphasis Attention (GPHA) simultaneously.
arXiv Detail & Related papers (2022-08-18T13:46:12Z) - Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
Learning [138.29273453811945]
Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks.
We propose a new Wavelet Vision Transformer (textbfWave-ViT) that formulates the invertible down-sampling with wavelet transforms and self-attention learning.
arXiv Detail & Related papers (2022-07-11T16:03:51Z) - Inception Transformer [151.939077819196]
Inception Transformer, or iFormer, learns comprehensive features with both high- and low-frequency information in visual data.
We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation.
arXiv Detail & Related papers (2022-05-25T17:59:54Z) - Trainable Wavelet Neural Network for Non-Stationary Signals [0.0]
This work introduces a wavelet neural network to learn a filter-bank specialized to fit non-stationary signals and improve interpretability and performance for digital signal processing.
The network uses a wavelet transform as the first layer of a neural network where the convolution is a parameterized function of the complex Morlet wavelet.
arXiv Detail & Related papers (2022-05-06T16:41:27Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Wavelet Integrated CNNs for Noise-Robust Image Classification [51.18193090255933]
We enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT)
WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
arXiv Detail & Related papers (2020-05-07T09:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.