Neural Architecture Search on Efficient Transformers and Beyond
- URL: http://arxiv.org/abs/2207.13955v1
- Date: Thu, 28 Jul 2022 08:41:41 GMT
- Title: Neural Architecture Search on Efficient Transformers and Beyond
- Authors: Zexiang Liu, Dong Li, Kaiyue Lu, Zhen Qin, Weixuan Sun, Jiacheng Xu,
Yiran Zhong
- Abstract summary: We propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique.
We observe that the optimal architecture of the efficient Transformer has the reduced computation compared with that of the standard Transformer.
Our searched architecture maintains comparable accuracy to the standard Transformer with notably improved computational efficiency.
- Score: 23.118556295894376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, numerous efficient Transformers have been proposed to reduce the
quadratic computational complexity of standard Transformers caused by the
Softmax attention. However, most of them simply swap Softmax with an efficient
attention mechanism without considering the customized architectures specially
for the efficient attention. In this paper, we argue that the handcrafted
vanilla Transformer architectures for Softmax attention may not be suitable for
efficient Transformers. To address this issue, we propose a new framework to
find optimal architectures for efficient Transformers with the neural
architecture search (NAS) technique. The proposed method is validated on
popular machine translation and image classification tasks. We observe that the
optimal architecture of the efficient Transformer has the reduced computation
compared with that of the standard Transformer, but the general accuracy is
less comparable. It indicates that the Softmax attention and efficient
attention have their own distinctions but neither of them can simultaneously
balance the accuracy and efficiency well. This motivates us to mix the two
types of attention to reduce the performance imbalance. Besides the search
spaces that commonly used in existing NAS Transformer approaches, we propose a
new search space that allows the NAS algorithm to automatically search the
attention variants along with architectures. Extensive experiments on WMT' 14
En-De and CIFAR-10 demonstrate that our searched architecture maintains
comparable accuracy to the standard Transformer with notably improved
computational efficiency.
Related papers
- ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures [5.502117675161604]
Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability.
It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors.
We propose an algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis.
arXiv Detail & Related papers (2023-10-05T18:55:30Z) - TurboViT: Generating Fast Vision Transformers via Generative
Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years.
There has been significant research recently on the design of efficient vision transformer architecture.
In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse
Transformers [11.811907838840712]
We propose an algorithm-hardware co-optimized framework to flexibly and efficiently accelerate Transformers by utilizing general N:M sparsity patterns.
We present a flexible and efficient hardware architecture, namely STA, to achieve significant speedup when deploying N:M sparse Transformers.
Experimental results show that compared to other methods, N:M sparse Transformers, generated using IDP, achieves an average of 6.7% improvement on accuracy with high training efficiency.
arXiv Detail & Related papers (2022-08-12T04:51:49Z) - Are Transformers More Robust? Towards Exact Robustness Verification for
Transformers [3.2259574483835673]
We study the robustness problem of Transformers, a key characteristic as low robustness may cause safety concerns.
Specifically, we focus on Sparsemax-based Transformers and reduce the finding of their maximum robustness to a Mixed Quadratically Constrained Programming (MIQCP) problem.
We then conduct experiments using the application of Land Departure to compare the robustness of Sparsemax-based Transformers against that of the more conventional Multi-Layer-Perceptron (MLP) NNs.
arXiv Detail & Related papers (2022-02-08T15:27:33Z) - Transformer Acceleration with Dynamic Sparse Attention [20.758709319088865]
We propose the Dynamic Sparse Attention (DSA) that can efficiently exploit the dynamic sparsity in the attention of Transformers.
Our approach can achieve better trade-offs between accuracy and model complexity.
arXiv Detail & Related papers (2021-10-21T17:31:57Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z) - Towards Accurate and Compact Architectures via Neural Architecture
Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost.
We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP)
We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z) - Optimizing Inference Performance of Transformers on CPUs [0.0]
Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc.
This paper presents an empirical analysis of scalability and performance of inferencing a Transformer-based model on CPUs.
arXiv Detail & Related papers (2021-02-12T17:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.