Row-wise Accelerator for Vision Transformer
- URL: http://arxiv.org/abs/2205.03998v1
- Date: Mon, 9 May 2022 01:47:44 GMT
- Title: Row-wise Accelerator for Vision Transformer
- Authors: Hong-Yi Wang, and Tian-Sheuan Chang
- Abstract summary: This paper proposes the hardware accelerator for vision transformers with row-wise scheduling.
The implementation with TSMC 40nm CMOS technology only requires 262K gate count and 149KB buffer for 403.2 GOPS throughput at 600MHz clock frequency.
- Score: 4.802171139840781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following the success of the natural language processing, the transformer for
vision applications has attracted significant attention in recent years due to
its excellent performance. However, existing deep learning hardware
accelerators for vision cannot execute this structure efficiently due to
significant model architecture differences. As a result, this paper proposes
the hardware accelerator for vision transformers with row-wise scheduling,
which decomposes major operations in vision transformers as a single dot
product primitive for a unified and efficient execution. Furthermore, by
sharing weights in columns, we can reuse the data and reduce the usage of
memory. The implementation with TSMC 40nm CMOS technology only requires 262K
gate count and 149KB SRAM buffer for 403.2 GOPS throughput at 600MHz clock
frequency.
Related papers
- CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference [4.523939613157408]
Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision.
This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs.
ChoSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.
arXiv Detail & Related papers (2024-07-17T16:56:06Z) - An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT [5.141764719319689]
We propose an FPGA-based accelerator for EfficientViT to advance the hardware efficiency frontier of ViTs.
Specifically, we design a reconfigurable architecture to efficiently support various operation types, including lightweight convolutions and attention.
Experimental results show that our accelerator achieves up to 780.2 GOPS in throughput and 105.1 GOPS/W in energy efficiency at 200MHz.
arXiv Detail & Related papers (2024-03-29T15:20:33Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - ViTA: A Vision Transformer Inference Accelerator for Edge Applications [4.3469216446051995]
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks.
They are compute-heavy and difficult to deploy in resource-constrained edge devices.
We propose ViTA - a hardware accelerator for inference of vision transformer models, targeting resource-constrained edge computing devices.
arXiv Detail & Related papers (2023-02-17T19:35:36Z) - Reversible Vision Transformers [74.3500977090597]
Reversible Vision Transformers are a memory efficient architecture for visual recognition.
We adapt two popular models, namely Vision Transformer and Multiscale Vision Transformers, to reversible variants.
We find that the additional computational burden of recomputing activations is more than overcome for deeper models.
arXiv Detail & Related papers (2023-02-09T18:59:54Z) - Dynamic Grained Encoder for Vision Transformers [150.02797954201424]
This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images.
We propose a Dynamic Grained for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.
Our encoder allows the state-of-the-art vision transformers to reduce computational complexity by 40%-60% while maintaining comparable performance on image classification.
arXiv Detail & Related papers (2023-01-10T07:55:29Z) - HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling [126.89573619301953]
We propose a new design of hierarchical vision transformers named HiViT (short for Hierarchical ViT)
HiViT enjoys both high efficiency and good performance in MIM.
In running MAE on ImageNet-1K, HiViT-B reports a +0.6% accuracy gain over ViT-B and a 1.9$times$ speed-up over Swin-B.
arXiv Detail & Related papers (2022-05-30T09:34:44Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Improving the Efficiency of Transformers for Resource-Constrained
Devices [1.3019517863608956]
We present a performance analysis of state-of-the-art vision transformers on several devices.
We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x.
arXiv Detail & Related papers (2021-06-30T12:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.