Benchmarking and In-depth Performance Study of Large Language Models on
Habana Gaudi Processors
- URL: http://arxiv.org/abs/2309.16976v1
- Date: Fri, 29 Sep 2023 04:49:35 GMT
- Title: Benchmarking and In-depth Performance Study of Large Language Models on
Habana Gaudi Processors
- Authors: Chengming Zhang, Baixi Sun, Xiaodong Yu, Zhen Xie, Weijian Zheng,
Kamil Iskra, Pete Beckman, Dingwen Tao
- Abstract summary: Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements.
Specialized AI hardware accelerators, such as the Habana GAUDI architecture, offer a promising solution to tackle these issues.
This paper explores the untapped potential of using GAUDI processors to accelerate Transformer-based models, addressing key challenges in the process.
- Score: 5.432613942292548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models have achieved remarkable success in various machine
learning tasks but suffer from high computational complexity and resource
requirements. The quadratic complexity of the self-attention mechanism further
exacerbates these challenges when dealing with long sequences and large
datasets. Specialized AI hardware accelerators, such as the Habana GAUDI
architecture, offer a promising solution to tackle these issues. GAUDI features
a Matrix Multiplication Engine (MME) and a cluster of fully programmable Tensor
Processing Cores (TPC). This paper explores the untapped potential of using
GAUDI processors to accelerate Transformer-based models, addressing key
challenges in the process. Firstly, we provide a comprehensive performance
comparison between the MME and TPC components, illuminating their relative
strengths and weaknesses. Secondly, we explore strategies to optimize MME and
TPC utilization, offering practical insights to enhance computational
efficiency. Thirdly, we evaluate the performance of Transformers on GAUDI,
particularly in handling long sequences and uncovering performance bottlenecks.
Lastly, we evaluate the end-to-end performance of two Transformer-based large
language models (LLM) on GAUDI. The contributions of this work encompass
practical insights for practitioners and researchers alike. We delve into
GAUDI's capabilities for Transformers through systematic profiling, analysis,
and optimization exploration. Our study bridges a research gap and offers a
roadmap for optimizing Transformer-based model training on the GAUDI
architecture.
Related papers
- Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? [82.51626700527837]
Transformer-based large language models displayed impressive in-context learning capabilities, where a pre-trained model can handle new tasks without fine-tuning.
We analyze how the mechanics of how Transformer to achieve ICL contribute to the technical challenges of the training problems in Transformers.
arXiv Detail & Related papers (2024-02-23T21:07:20Z) - A Comprehensive Performance Study of Large Language Models on Novel AI
Accelerators [2.88634411143577]
Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems.
Specialized AI accelerator hardware systems have recently become available for accelerating AI applications.
arXiv Detail & Related papers (2023-10-06T21:55:57Z) - A survey on efficient vision transformers: algorithms, techniques, and
performance benchmarking [19.65897437342896]
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications.
This paper mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios.
arXiv Detail & Related papers (2023-09-05T08:21:16Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Optimizing Inference Performance of Transformers on CPUs [0.0]
Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc.
This paper presents an empirical analysis of scalability and performance of inferencing a Transformer-based model on CPUs.
arXiv Detail & Related papers (2021-02-12T17:01:35Z) - A Learned Performance Model for Tensor Processing Units [5.733911161090224]
We demonstrate a method of learning performance models from a corpus of graph programs for Processing Unit (TPU) instances.
We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks.
It helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.
arXiv Detail & Related papers (2020-08-03T17:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.