An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
- URL: http://arxiv.org/abs/2403.20230v1
- Date: Fri, 29 Mar 2024 15:20:33 GMT
- Title: An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
- Authors: Haikuo Shao, Huihong Shi, Wendong Mao, Zhongfeng Wang,
- Abstract summary: We propose an FPGA-based accelerator for EfficientViT to advance the hardware efficiency frontier of ViTs.
Specifically, we design a reconfigurable architecture to efficiently support various operation types, including lightweight convolutions and attention.
Experimental results show that our accelerator achieves up to 780.2 GOPS in throughput and 105.1 GOPS/W in energy efficiency at 200MHz.
- Score: 5.141764719319689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformers (ViTs) have achieved significant success in computer vision. However, their intensive computations and massive memory footprint challenge ViTs' deployment on embedded devices, calling for efficient ViTs. Among them, EfficientViT, the state-of-the-art one, features a Convolution-Transformer hybrid architecture, enhancing both accuracy and hardware efficiency. Unfortunately, existing accelerators cannot fully exploit the hardware benefits of EfficientViT due to its unique architecture. In this paper, we propose an FPGA-based accelerator for EfficientViT to advance the hardware efficiency frontier of ViTs. Specifically, we design a reconfigurable architecture to efficiently support various operation types, including lightweight convolutions and attention, boosting hardware utilization. Additionally, we present a time-multiplexed and pipelined dataflow to facilitate both intra- and inter-layer fusions, reducing off-chip data access costs. Experimental results show that our accelerator achieves up to 780.2 GOPS in throughput and 105.1 GOPS/W in energy efficiency at 200MHz on the Xilinx ZCU102 FPGA, which significantly outperforms prior works.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline [13.849471252921004]
Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging.
We introduce HG-PIPE, a pipelined FPGA accelerator for high- throughput and low-latency ViT processing.
With a VCK190 FPGA, HG-PIPE realizes end-to-end ViT acceleration on a single device and achieves 7118 images/s, which is 2.81 times faster than a V100 GPU.
arXiv Detail & Related papers (2024-07-25T08:47:40Z) - CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference [4.523939613157408]
Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision.
This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs.
ChoSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.
arXiv Detail & Related papers (2024-07-17T16:56:06Z) - TurboViT: Generating Fast Vision Transformers via Generative
Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years.
There has been significant research recently on the design of efficient vision transformer architecture.
In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z) - Exploring Lightweight Hierarchical Vision Transformers for Efficient
Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices.
HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z) - Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
with Task-level Sparsity via Mixture-of-Experts [60.1586169973792]
M$3$ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE)
MoE achieves better accuracy and over 80% reduction computation but leaves challenges for efficient deployment on FPGA.
Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations.
arXiv Detail & Related papers (2023-05-30T02:24:03Z) - Design optimization for high-performance computing using FPGA [0.0]
We optimize Tensil AI's open-source inference accelerator for maximum performance using ResNet20 trained on CIFAR.
Running the CIFAR test data set shows very little accuracy drop when rounding down from the original 32-bit floating point.
The proposed accelerator achieves a throughput of 21.12 Giga-Operations Per Second (GOP/s) with a 5.21 W on-chip power consumption at 100 MHz.
arXiv Detail & Related papers (2023-04-24T22:20:42Z) - ViTA: A Vision Transformer Inference Accelerator for Edge Applications [4.3469216446051995]
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks.
They are compute-heavy and difficult to deploy in resource-constrained edge devices.
We propose ViTA - a hardware accelerator for inference of vision transformer models, targeting resource-constrained edge computing devices.
arXiv Detail & Related papers (2023-02-17T19:35:36Z) - Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision
Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks.
This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z) - Row-wise Accelerator for Vision Transformer [4.802171139840781]
This paper proposes the hardware accelerator for vision transformers with row-wise scheduling.
The implementation with TSMC 40nm CMOS technology only requires 262K gate count and 149KB buffer for 403.2 GOPS throughput at 600MHz clock frequency.
arXiv Detail & Related papers (2022-05-09T01:47:44Z) - VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit
Vision Transformer [121.85581713299918]
We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs)
Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations.
This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
arXiv Detail & Related papers (2022-01-17T20:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.