TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing
- URL: http://arxiv.org/abs/2408.12742v1
- Date: Thu, 22 Aug 2024 21:51:38 GMT
- Title: TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing
- Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim, Priyadarshini Panda,
- Abstract summary: We propose TReX, an attention-reuse-driven ViT optimization framework.
We find that TReX achieves 2.3x (2.19x) EDAP reduction and 1.86x (1.79x) TOPS/mm2 improvement with 1% accuracy drop in case of DeiT-S (LV-ViT-S) ViT models.
On NLP tasks such as CoLA, TReX leads to 2% higher non-ideal accuracy compared to baseline at 1.6x lower EDAP.
- Score: 12.583079680322156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the high computation overhead of Vision Transformers (ViTs), In-memory Computing architectures are being researched towards energy-efficient deployment in edge-computing scenarios. Prior works have proposed efficient algorithm-hardware co-design and IMC-architectural improvements to improve the energy-efficiency of IMC-implemented ViTs. However, all prior works have neglected the overhead and co-depencence of attention blocks on the accuracy-energy-delay-area of IMC-implemented ViTs. To this end, we propose TReX- an attention-reuse-driven ViT optimization framework that effectively performs attention reuse in ViT models to achieve optimal accuracy-energy-delay-area tradeoffs. TReX optimally chooses the transformer encoders for attention reuse to achieve near iso-accuracy performance while meeting the user-specified delay requirement. Based on our analysis on the Imagenet-1k dataset, we find that TReX achieves 2.3x (2.19x) EDAP reduction and 1.86x (1.79x) TOPS/mm2 improvement with ~1% accuracy drop in case of DeiT-S (LV-ViT-S) ViT models. Additionally, TReX achieves high accuracy at high EDAP reduction compared to state-of-the-art token pruning and weight sharing approaches. On NLP tasks such as CoLA, TReX leads to 2% higher non-ideal accuracy compared to baseline at 1.6x lower EDAP.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer [0.0]
We propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE)
LATTE employs a headwise threshold-based filter with the low-precision dot product to reduce the computation of Multi-Head Attention (MHA)
Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings.
arXiv Detail & Related papers (2024-04-11T07:23:19Z) - Point Transformer V3: Simpler, Faster, Stronger [88.80496333515325]
This paper focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing.
We present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms.
PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios.
arXiv Detail & Related papers (2023-12-15T18:59:59Z) - DeViT: Decomposing Vision Transformers for Collaborative Inference in
Edge Devices [42.89175608336226]
Vision transformer (ViT) has achieved state-of-the-art performance on multiple computer vision benchmarks.
ViT models suffer from vast amounts of parameters and high computation cost, leading to difficult deployment on resource-constrained edge devices.
We propose a collaborative inference framework termed DeViT to facilitate edge deployment by decomposing large ViTs.
arXiv Detail & Related papers (2023-09-10T12:26:17Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision
Transformer with Heterogeneous Attention [11.999596399083089]
We propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC.
With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 3.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction.
arXiv Detail & Related papers (2022-11-25T08:37:17Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity
Prediction [16.578899848650675]
Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices.
We propose a cascade pruning framework named CP-ViT by predicting sparsity in ViT models progressively and dynamically to reduce computational redundancy while minimizing the accuracy loss.
arXiv Detail & Related papers (2022-03-09T08:15:14Z) - Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.