GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length
- URL: http://arxiv.org/abs/2310.00576v1
- Date: Sun, 1 Oct 2023 05:25:24 GMT
- Title: GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length
- Authors: Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Chia-Yuan
Chang, Xia Hu
- Abstract summary: This paper introduces a novel, simple, and effective method named growlength'' to accelerate the pretraining process of Large Language Models.
Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency.
- Score: 65.24730341801468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The evolving sophistication and intricacies of Large Language Models (LLMs)
yield unprecedented advancements, yet they simultaneously demand considerable
computational resources and incur significant costs. To alleviate these
challenges, this paper introduces a novel, simple, and effective method named
``\growlength'' to accelerate the pretraining process of LLMs. Our method
progressively increases the training length throughout the pretraining phase,
thereby mitigating computational costs and enhancing efficiency. For instance,
it begins with a sequence length of 128 and progressively extends to 4096. This
approach enables models to process a larger number of tokens within limited
time frames, potentially boosting their performance. In other words, the
efficiency gain is derived from training with shorter sequences optimizing the
utilization of resources. Our extensive experiments with various
state-of-the-art LLMs have revealed that models trained using our method not
only converge more swiftly but also exhibit superior performance metrics
compared to those trained with existing methods. Furthermore, our method for
LLMs pretraining acceleration does not require any additional engineering
efforts, making it a practical solution in the realm of LLMs.
Related papers
- Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Sparsity-Accelerated Training for Large Language Models [20.86225596276327]
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks.
LLMs often require additional training, such as continual pre-training and supervised fine-tuning.
This paper proposes leveraging emphsparsity in pre-trained LLMs to expedite this training process.
arXiv Detail & Related papers (2024-06-03T14:56:09Z) - Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes.
This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z) - InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM.
InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention.
Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism [70.07661254213181]
We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs)
Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting.
Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead.
arXiv Detail & Related papers (2023-12-08T09:31:50Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.