Related papers: TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

URL: http://arxiv.org/abs/2308.11421v1
Date: Tue, 22 Aug 2023 13:08:29 GMT
Title: TurboViT: Generating Fast Vision Transformers via Generative Architecture Search
Authors: Alexander Wong, Saad Abbasi, Saeejith Nair
Abstract summary: Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. There has been significant research recently on the design of efficient vision transformer architecture. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
Score: 74.24393546346974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. However, the architectural and computational complexity of such network architectures have made them challenging to deploy in real-world applications with high-throughput, low-memory requirements. As such, there has been significant research recently on the design of efficient vision transformer architectures. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search (GAS) to achieve a strong balance between accuracy and architectural and computational efficiency. Through this generative architecture search process, we create TurboViT, a highly efficient hierarchical vision transformer architecture design that is generated around mask unit attention and Q-pooling design patterns. The resulting TurboViT architecture design achieves significantly lower architectural computational complexity (>2.47$\times$ smaller than FasterViT-0 while achieving same accuracy) and computational complexity (>3.4$\times$ fewer FLOPs and 0.9% higher accuracy than MobileViT2-2.0) when compared to 10 other state-of-the-art efficient vision transformer network architecture designs within a similar range of accuracy on the ImageNet-1K dataset. Furthermore, TurboViT demonstrated strong inference latency and throughput in both low-latency and batch processing scenarios (>3.21$\times$ lower latency and >3.18$\times$ higher throughput compared to FasterViT-0 for low-latency scenario). These promising results demonstrate the efficacy of leveraging generative architecture search for generating efficient transformer architecture designs for high-throughput scenarios.

Related papers

DASViT: Differentiable Architecture Search for Vision Transformer [8.839801565444775]
We introduce Differentiable Architecture Search for Vision Transformer (DASViT)<n>DASViT bridges the gap in differentiable search for ViTs and uncovers novel designs.<n> Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
arXiv Detail & Related papers (2025-07-17T12:48:00Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Rethinking Vision Transformers for MobileNet Size and Speed [58.01406896628446]
We propose a novel supernet with low latency and high parameter efficiency. We also introduce a novel fine-grained joint search strategy for transformer models. This work demonstrate that properly designed and optimized vision transformers can achieve high performance even with MobileNet-level size and speed.
arXiv Detail & Related papers (2022-12-15T18:59:12Z)
Efficient Neural Net Approaches in Metal Casting Defect Detection [0.0]
This research proposes a lightweight architecture that is efficient in terms of accuracy and inference time. Our results indicate that a custom model of 590K parameters with depth-wise separable convolutions outperformed pretrained architectures.
arXiv Detail & Related papers (2022-08-08T13:54:36Z)
Neural Architecture Search on Efficient Transformers and Beyond [23.118556295894376]
We propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique. We observe that the optimal architecture of the efficient Transformer has the reduced computation compared with that of the standard Transformer. Our searched architecture maintains comparable accuracy to the standard Transformer with notably improved computational efficiency.
arXiv Detail & Related papers (2022-07-28T08:41:41Z)
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation [79.265315267391]
We propose a simple and compact ViT architecture called Universal Vision Transformer (UViT) UViT achieves strong performance on object detection and instance segmentation tasks.
arXiv Detail & Related papers (2021-12-17T20:11:56Z)
Vision Transformer Architecture Search [64.73920718915282]
Current vision transformers (ViTs) are simply inherited from natural language processing (NLP) tasks. We propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets. Our searched architecture achieves $74.7%$ top-$1$ accuracy on ImageNet and is $2.5%$ superior than the current baseline ViT architecture.
arXiv Detail & Related papers (2021-06-25T15:39:08Z)
Twins: Revisiting Spatial Attention Design in Vision Transformers [81.02454258677714]
In this work, we demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. We propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks.
arXiv Detail & Related papers (2021-04-28T15:42:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.