VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit
Vision Transformer
- URL: http://arxiv.org/abs/2201.06618v1
- Date: Mon, 17 Jan 2022 20:27:52 GMT
- Title: VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit
Vision Transformer
- Authors: Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen,
Xiaolong Ma, Zhangyang Wang, Yanzhi Wang
- Abstract summary: We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs)
Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations.
This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
- Score: 121.85581713299918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The transformer architectures with attention mechanisms have obtained success
in Nature Language Processing (NLP), and Vision Transformers (ViTs) have
recently extended the application domains to various vision tasks. While
achieving high performance, ViTs suffer from large model size and high
computation complexity that hinders the deployment of them on edge devices. To
achieve high throughput on hardware and preserve the model accuracy
simultaneously, we propose VAQF, a framework that builds inference accelerators
on FPGA platforms for quantized ViTs with binary weights and low-precision
activations. Given the model structure and the desired frame rate, VAQF will
automatically output the required quantization precision for activations as
well as the optimized parameter settings of the accelerator that fulfill the
hardware requirements. The implementations are developed with Vivado High-Level
Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results
with the DeiT-base model indicate that a frame rate requirement of 24 frames
per second (FPS) is satisfied with 8-bit activation quantization, and a target
of 30 FPS is met with 6-bit activation quantization. To the best of our
knowledge, this is the first time quantization has been incorporated into ViT
acceleration on FPGAs with the help of a fully automatic framework to guide the
quantization strategy on the software side and the accelerator implementations
on the hardware side given the target frame rate. Very small compilation time
cost is incurred compared with quantization training, and the generated
accelerators show the capability of achieving real-time execution for
state-of-the-art ViT models on FPGAs.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z) - Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey [6.04807281619171]
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.
This article provides a comprehensive survey of ViTs quantization and its hardware acceleration.
arXiv Detail & Related papers (2024-05-01T04:32:07Z) - HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision
Transformers [35.92244135055901]
HeatViT is an image-adaptive token pruning framework for vision transformers (ViTs) on embedded FPGAs.
HeatViT can achieve 0.7%$sim$8.9% higher accuracy compared to existing ViT pruning studies.
HeatViT can achieve more than 28.4%$sim computation reduction, for various widely used ViTs.
arXiv Detail & Related papers (2022-11-15T13:00:43Z) - Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision
Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks.
This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z) - Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.