Related papers: TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing

TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing

URL: http://arxiv.org/abs/2408.12742v1
Date: Thu, 22 Aug 2024 21:51:38 GMT
Title: TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing
Authors: Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim, Priyadarshini Panda,
Abstract summary: We propose TReX, an attention-reuse-driven ViT optimization framework. We find that TReX achieves 2.3x (2.19x) EDAP reduction and 1.86x (1.79x) TOPS/mm2 improvement with 1% accuracy drop in case of DeiT-S (LV-ViT-S) ViT models. On NLP tasks such as CoLA, TReX leads to 2% higher non-ideal accuracy compared to baseline at 1.6x lower EDAP.
Score: 12.583079680322156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to the high computation overhead of Vision Transformers (ViTs), In-memory Computing architectures are being researched towards energy-efficient deployment in edge-computing scenarios. Prior works have proposed efficient algorithm-hardware co-design and IMC-architectural improvements to improve the energy-efficiency of IMC-implemented ViTs. However, all prior works have neglected the overhead and co-depencence of attention blocks on the accuracy-energy-delay-area of IMC-implemented ViTs. To this end, we propose TReX- an attention-reuse-driven ViT optimization framework that effectively performs attention reuse in ViT models to achieve optimal accuracy-energy-delay-area tradeoffs. TReX optimally chooses the transformer encoders for attention reuse to achieve near iso-accuracy performance while meeting the user-specified delay requirement. Based on our analysis on the Imagenet-1k dataset, we find that TReX achieves 2.3x (2.19x) EDAP reduction and 1.86x (1.79x) TOPS/mm2 improvement with ~1% accuracy drop in case of DeiT-S (LV-ViT-S) ViT models. Additionally, TReX achieves high accuracy at high EDAP reduction compared to state-of-the-art token pruning and weight sharing approaches. On NLP tasks such as CoLA, TReX leads to 2% higher non-ideal accuracy compared to baseline at 1.6x lower EDAP.

Related papers

POLARON: Precision-aware On-device Learning and Adaptive Runtime-cONfigurable AI acceleration [0.0]
This work presents a SIMD-enabled, multi-precision MAC engine that performs efficient multiply-accumulate operations.<n>The architecture incorporates a layer adaptive precision strategy to align computational accuracy with workload sensitivity.<n>Results demonstrate up to 2x improvement in PDP and 3x reduction in resource usage compared to SoTA designs.
arXiv Detail & Related papers (2025-06-10T13:33:02Z)
Ultra-Resolution Adaptation with Ease [62.56434979517156]
We propose a set of key guidelines for ultra-resolution adaptation termed emphURAE. We show that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable. Experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations.
arXiv Detail & Related papers (2025-03-20T16:44:43Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs) This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z)
LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer [0.0]
We propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE) LATTE employs a headwise threshold-based filter with the low-precision dot product to reduce the computation of Multi-Head Attention (MHA) Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings.
arXiv Detail & Related papers (2024-04-11T07:23:19Z)
Point Transformer V3: Simpler, Faster, Stronger [88.80496333515325]
This paper focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing. We present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms. PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios.
arXiv Detail & Related papers (2023-12-15T18:59:59Z)
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices [42.89175608336226]
Vision transformer (ViT) has achieved state-of-the-art performance on multiple computer vision benchmarks. ViT models suffer from vast amounts of parameters and high computation cost, leading to difficult deployment on resource-constrained edge devices. We propose a collaborative inference framework termed DeViT to facilitate edge deployment by decomposing large ViTs.
arXiv Detail & Related papers (2023-09-10T12:26:17Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention [11.999596399083089]
We propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 3.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction.
arXiv Detail & Related papers (2022-11-25T08:37:17Z)
Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT) Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z)
Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches. We propose a decoder-free fully transformer-based (DFFT) object detector. DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z)
CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction [16.578899848650675]
Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices. We propose a cascade pruning framework named CP-ViT by predicting sparsity in ViT models progressively and dynamically to reduce computational redundancy while minimizing the accuracy loss.
arXiv Detail & Related papers (2022-03-09T08:15:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.