Related papers: VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer

VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer

URL: http://arxiv.org/abs/2201.06618v1
Date: Mon, 17 Jan 2022 20:27:52 GMT
Title: VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer
Authors: Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen, Xiaolong Ma, Zhangyang Wang, Yanzhi Wang
Abstract summary: We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs) Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations. This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
Score: 121.85581713299918
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on hardware and preserve the model accuracy simultaneously, we propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized ViTs with binary weights and low-precision activations. Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations as well as the optimized parameter settings of the accelerator that fulfill the hardware requirements. The implementations are developed with Vivado High-Level Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results with the DeiT-base model indicate that a frame rate requirement of 24 frames per second (FPS) is satisfied with 8-bit activation quantization, and a target of 30 FPS is met with 6-bit activation quantization. To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate. Very small compilation time cost is incurred compared with quantization training, and the generated accelerators show the capability of achieving real-time execution for state-of-the-art ViT models on FPGAs.

Related papers

Automating Versatile Time-Series Analysis with Tiny Transformers on Embedded FPGAs [18.15754187896287]
This paper presents a unified and fully automated deployment framework for Tiny Transformers on embedded FPGAs.<n>Our framework supports a compact encoder-only Transformer architecture across three representative time-series tasks.<n>Results show that our framework produces integer-only, task-specific Transformer accelerators achieving as low as 0.033 mJ per inference with millisecond latency on AMD Spartan-7.
arXiv Detail & Related papers (2025-05-23T09:27:25Z)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs) This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey [6.04807281619171]
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. This article provides a comprehensive survey of ViTs quantization and its hardware acceleration.
arXiv Detail & Related papers (2024-05-01T04:32:07Z)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization [70.5455407146695]
We introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion.<n>I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
arXiv Detail & Related papers (2023-11-16T13:07:47Z)
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers [35.92244135055901]
HeatViT is an image-adaptive token pruning framework for vision transformers (ViTs) on embedded FPGAs. HeatViT can achieve 0.7%$sim$8.9% higher accuracy compared to existing ViT pruning studies. HeatViT can achieve more than 28.4%$sim computation reduction, for various widely used ViTs.
arXiv Detail & Related papers (2022-11-15T13:00:43Z)
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z)
Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.