Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
- URL: http://arxiv.org/abs/2407.18175v1
- Date: Thu, 25 Jul 2024 16:35:46 GMT
- Title: Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
- Authors: Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang,
- Abstract summary: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
- Score: 56.37495946212932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for hardware implementation while preserving the accuracy. First, Quasar-ViT trains a supernet using our row-wise flexible mixed-precision quantization scheme, mixed-precision weight entanglement, and supernet layer scaling techniques. Then, it applies an efficient hardware-oriented search algorithm, integrated with hardware latency and resource modeling, to determine a series of optimal subnets from supernet under different inference latency targets. Finally, we propose a series of model-adaptive designs on the FPGA platform to support the architecture search and mitigate the gap between the theoretical computation reduction and the practical inference speedup. Our searched models achieve 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA with 80.4%, 78.6%, and 74.9% top-1 accuracy, respectively, for the ImageNet dataset, consistently outperforming prior works.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z) - Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey [6.04807281619171]
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.
This article provides a comprehensive survey of ViTs quantization and its hardware acceleration.
arXiv Detail & Related papers (2024-05-01T04:32:07Z) - ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding [9.144813021145039]
This paper introduces ParFormer, a vision transformer that incorporates a Parallel Mixer and a Sparse Channel Attention Patch Embedding (SCAPE)
ParFormer improves feature extraction by combining convolutional and attention mechanisms.
For edge device deployment, ParFormer-T excels with a throughput of 278.1 images/sec, which is 1.38 $times$ higher than EdgeNeXt-S.
The larger variant, ParFormer-L, reaches 83.5% Top-1 accuracy, offering a balanced trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-22T07:32:21Z) - PriViT: Vision Transformers for Fast Private Inference [55.36478271911595]
Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications.
ViTs are ill-suited for private inference using secure multi-party protocols, due to the large number of non-polynomial operations.
We propose PriViT, an algorithm to selectively " Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy.
arXiv Detail & Related papers (2023-10-06T21:45:05Z) - HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision
Transformers [35.92244135055901]
HeatViT is an image-adaptive token pruning framework for vision transformers (ViTs) on embedded FPGAs.
HeatViT can achieve 0.7%$sim$8.9% higher accuracy compared to existing ViT pruning studies.
HeatViT can achieve more than 28.4%$sim computation reduction, for various widely used ViTs.
arXiv Detail & Related papers (2022-11-15T13:00:43Z) - Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision
Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks.
This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z) - VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit
Vision Transformer [121.85581713299918]
We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs)
Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations.
This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
arXiv Detail & Related papers (2022-01-17T20:27:52Z) - HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.