HeTraX: Energy Efficient 3D Heterogeneous Manycore Architecture for Transformer Acceleration
- URL: http://arxiv.org/abs/2408.03397v1
- Date: Tue, 6 Aug 2024 18:48:01 GMT
- Title: HeTraX: Energy Efficient 3D Heterogeneous Manycore Architecture for Transformer Acceleration
- Authors: Pratyush Dhingra, Janardhan Rao Doppa, Partha Pratim Pande,
- Abstract summary: We propose the design of a three-dimensional heterogeneous architecture referred to as HeTraX specifically optimized to accelerate transformer models.
Experimental results show that HeTraX outperforms existing state-of-the-art by up to 5.6x in speedup and improves EDP by 14.5x while ensuring thermally feasibility.
- Score: 18.355570259898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have revolutionized deep learning and generative modeling to enable unprecedented advancements in natural language processing tasks and beyond. However, designing hardware accelerators for executing transformer models is challenging due to the wide variety of computing kernels involved in the transformer architecture. Existing accelerators are either inadequate to accelerate end-to-end transformer models or suffer notable thermal limitations. In this paper, we propose the design of a three-dimensional heterogeneous architecture referred to as HeTraX specifically optimized to accelerate end-to-end transformer models. HeTraX employs hardware resources aligned with the computational kernels of transformers and optimizes both performance and energy. Experimental results show that HeTraX outperforms existing state-of-the-art by up to 5.6x in speedup and improves EDP by 14.5x while ensuring thermally feasibility.
Related papers
- Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment [3.391499691517567]
Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices.
We propose a co-design method for efficient end-to-end edge deployment of Transformers from three aspects: algorithm, hardware, and joint optimization.
Experimental results show our co-design achieves up to 2.14-49.37x throughput gains and 3.72-88.53x better energy efficiency over state-of-the-art Transformer accelerators.
arXiv Detail & Related papers (2024-07-16T12:36:10Z) - TurboViT: Generating Fast Vision Transformers via Generative
Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years.
There has been significant research recently on the design of efficient vision transformer architecture.
In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z) - TransCODE: Co-design of Transformers and Accelerators for Efficient
Training and Inference [6.0093441900032465]
We propose a framework that simulates transformer inference and training on a design space of accelerators.
We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models.
The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair.
arXiv Detail & Related papers (2023-03-27T02:45:18Z) - AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with
Transformers [6.0093441900032465]
Self-attention-based transformer models have achieved tremendous success in the domain of natural language processing.
Previous works directly operate on large matrices involved in the attention operation, which limits hardware utilization.
We propose a novel dynamic inference scheme, DynaTran, which prunes activations at runtime with low overhead.
arXiv Detail & Related papers (2023-02-28T16:17:23Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - A K-variate Time Series Is Worth K Words: Evolution of the Vanilla
Transformer Architecture for Long-term Multivariate Time Series Forecasting [52.33042819442005]
Transformer has become the de facto solution for MTSF, especially for the long-term cases.
In this study, we point out that the current tokenization strategy in MTSF Transformer architectures ignores the token inductive bias of Transformers.
We make a series of evolution on the basic architecture of the vanilla MTSF transformer.
Surprisingly, the evolved simple transformer architecture is highly effective, which successfully avoids the over-smoothing phenomena in the vanilla MTSF transformer.
arXiv Detail & Related papers (2022-12-06T07:00:31Z) - Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision
Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks.
This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z) - Vis-TOP: Visual Transformer Overlay Processor [9.80151619872144]
Transformer has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV)
We propose Vis-TOP, an overlay processor for various visual Transformer models.
Vis-TOP summarizes the characteristics of all visual Transformer models and implements a three-layer and two-level transformation structure.
arXiv Detail & Related papers (2021-10-21T08:11:12Z) - AutoTrans: Automating Transformer Design via Reinforced Architecture
Search [52.48985245743108]
This paper empirically explore how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand.
Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers.
arXiv Detail & Related papers (2020-09-04T08:46:22Z) - Transformer on a Diet [81.09119185568296]
Transformer has been widely used thanks to its ability to capture sequence information in an efficient way.
Recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness.
We explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results.
arXiv Detail & Related papers (2020-02-14T18:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.