AutoST: Training-free Neural Architecture Search for Spiking
Transformers
- URL: http://arxiv.org/abs/2307.00293v2
- Date: Thu, 14 Dec 2023 00:58:03 GMT
- Title: AutoST: Training-free Neural Architecture Search for Spiking
Transformers
- Authors: Ziqing Wang, Qidong Zhao, Jinku Cui, Xu Liu, Dongkuan Xu
- Abstract summary: Spiking Transformers achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers.
Existing Spiking Transformer architectures exhibit a notable architectural gap, resulting in suboptimal performance.
We introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance Spiking Transformer architectures.
- Score: 14.791412391584064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spiking Transformers have gained considerable attention because they achieve
both the energy efficiency of Spiking Neural Networks (SNNs) and the high
capacity of Transformers. However, the existing Spiking Transformer
architectures, derived from Artificial Neural Networks (ANNs), exhibit a
notable architectural gap, resulting in suboptimal performance compared to
their ANN counterparts. Manually discovering optimal architectures is
time-consuming. To address these limitations, we introduce AutoST, a
training-free NAS method for Spiking Transformers, to rapidly identify
high-performance Spiking Transformer architectures. Unlike existing
training-free NAS methods, which struggle with the non-differentiability and
high sparsity inherent in SNNs, we propose to utilize Floating-Point Operations
(FLOPs) as a performance metric, which is independent of model computations and
training dynamics, leading to a stronger correlation with performance. Our
extensive experiments show that AutoST models outperform state-of-the-art
manually or automatically designed SNN architectures on static and neuromorphic
datasets. Full code, model, and data are released for reproduction.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks [22.665939536001797]
We propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method.
Based on DSSA, we propose a novel spiking Vision Transformer architecture called SpikingResformer.
We show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking Vision Transformer counterparts.
arXiv Detail & Related papers (2024-03-21T11:16:42Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - SpikingJelly: An open-source machine learning infrastructure platform
for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency.
We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z) - Exploring the Performance and Efficiency of Transformer Models for NLP
on Mobile Devices [3.809702129519641]
New deep neural network (DNN) architectures and approaches are emerging every few years, driving the field's advancement.
Transformers are a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges.
This work aims to make steps towards bridging this gap by examining the current state of Transformers' on-device execution.
arXiv Detail & Related papers (2023-06-20T10:15:01Z) - Auto-Spikformer: Spikformer Architecture Search [22.332981906087785]
Self-attention mechanisms have been integrated into Spiking Neural Networks (SNNs)
Recent advancements in SNN architecture, such as Spikformer, have demonstrated promising outcomes.
We propose Auto-Spikformer, a one-shot Transformer Architecture Search (TAS) method, which automates the quest for an optimized Spikformer architecture.
arXiv Detail & Related papers (2023-06-01T15:35:26Z) - Training-free Neural Architecture Search for RNNs and Transformers [0.0]
We develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture.
We find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search.
arXiv Detail & Related papers (2023-06-01T02:06:13Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.