Greenformers: Improving Computation and Memory Efficiency in Transformer
Models via Low-Rank Approximation
- URL: http://arxiv.org/abs/2108.10808v1
- Date: Tue, 24 Aug 2021 15:51:40 GMT
- Title: Greenformers: Improving Computation and Memory Efficiency in Transformer
Models via Low-Rank Approximation
- Authors: Samuel Cahyawijaya
- Abstract summary: We introduce Greenformers, a collection of model efficiency methods to improve the model efficiency of transformer models.
We propose a low-rank factorization approach to improve the efficiency of the transformer model called Low-Rank Transformer.
We show that Low-Rank Transformer is more suitable for on-device deployment, as it significantly reduces the model size.
- Score: 3.3576886095389296
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this thesis, we introduce Greenformers, a collection of model efficiency
methods to improve the model efficiency of the recently renowned transformer
models with a low-rank approximation approach. The development trend of deep
learning models tends to results in a more complex and larger model. Although
it leads to a better and more accurate prediction, the resulting model becomes
even more costly, as it requires weeks of training with a huge amount of GPU
resources. Particularly, the size and computational cost of transformer-based
models have increased tremendously since its first debut in 2017 from ~100
million parameters up to ~1.6 trillion parameters in early 2021. This
computationally hungry model also incurs a substantial cost to the environment
and even reaches an alarming level of carbon footprint. Some of these models
are so massive that it is even impossible to run the model without a GPU
cluster.
Greenformers improve the model efficiency of transformer models by applying
low-rank approximation approaches. Specifically, we propose a low-rank
factorization approach to improve the efficiency of the transformer model
called Low-Rank Transformer. We further compare our model with an existing
low-rank factorization approach called Linformer. Based on our analysis, the
Low-Rank Transformer model is suitable for improving both the time and memory
efficiency in processing short-sequence (<= 512) input data, while the
Linformer model is suitable for improving the efficiency in processing
long-sequence input data (>= 512). We also show that Low-Rank Transformer is
more suitable for on-device deployment, as it significantly reduces the model
size. Additionally, we estimate that applying LRT to the existing BERT-base
model can significantly reduce the computational, economical, and environmental
costs for developing such models by more than 30% of its original costs.
Related papers
- PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [16.9278983497498]
We propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage.
This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
arXiv Detail & Related papers (2023-10-16T07:17:33Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - READ: Recurrent Adaptation of Large Transformers [6.0031415516812725]
Fine-tuning large-scale Transformers becomes impractical as the model size and number of tasks increase.
We introduce textbfREcurrent textbfADaption (READ) -- a lightweight and memory-efficient fine-tuning method.
arXiv Detail & Related papers (2023-05-24T16:59:41Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of
Language Model [92.55145016562867]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Train Large, Then Compress: Rethinking Model Size for Efficient Training
and Inference of Transformers [94.43313684188819]
We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute.
We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps.
This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models.
arXiv Detail & Related papers (2020-02-26T21:17:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.