A Survey of Techniques for Optimizing Transformer Inference
- URL: http://arxiv.org/abs/2307.07982v1
- Date: Sun, 16 Jul 2023 08:50:50 GMT
- Title: A Survey of Techniques for Optimizing Transformer Inference
- Authors: Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, Venkatram
Vishwanath, Arun K. Somani
- Abstract summary: Recent years have seen a phenomenal rise in performance and applications of transformer neural networks.
Transformer-based networks such as ChatGPT have impacted the lives of common men.
Researchers have proposed techniques to optimize transformer inference at all levels of abstraction.
- Score: 3.6258657276072253
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent years have seen a phenomenal rise in performance and applications of
transformer neural networks. The family of transformer networks, including
Bidirectional Encoder Representations from Transformer (BERT), Generative
Pretrained Transformer (GPT) and Vision Transformer (ViT), have shown their
effectiveness across Natural Language Processing (NLP) and Computer Vision (CV)
domains. Transformer-based networks such as ChatGPT have impacted the lives of
common men. However, the quest for high predictive performance has led to an
exponential increase in transformers' memory and compute footprint. Researchers
have proposed techniques to optimize transformer inference at all levels of
abstraction. This paper presents a comprehensive survey of techniques for
optimizing the inference phase of transformer networks. We survey techniques
such as knowledge distillation, pruning, quantization, neural architecture
search and lightweight network design at the algorithmic level. We further
review hardware-level optimization techniques and the design of novel hardware
accelerators for transformers. We summarize the quantitative results on the
number of parameters/FLOPs and accuracy of several models/techniques to
showcase the tradeoff exercised by them. We also outline future directions in
this rapidly evolving field of research. We believe that this survey will
educate both novice and seasoned researchers and also spark a plethora of
research efforts in this field.
Related papers
- On the Expressive Power of a Variant of the Looped Transformer [83.30272757948829]
We design a novel transformer block, dubbed AlgoFormer, to empower transformers with algorithmic capabilities.
The proposed AlgoFormer can achieve significantly higher in algorithm representation when using the same number of parameters.
Some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms.
arXiv Detail & Related papers (2024-02-21T07:07:54Z) - Transformers in Reinforcement Learning: A Survey [7.622978576824539]
Transformers have impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks.
This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability.
arXiv Detail & Related papers (2023-07-12T07:51:12Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Vision Transformers for Action Recognition: A Survey [41.69370782177517]
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks.
Human action recognition is receiving special attention from the research community due to its widespread applications.
arXiv Detail & Related papers (2022-09-13T02:57:05Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.