GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length
- URL: http://arxiv.org/abs/2310.00576v1
- Date: Sun, 1 Oct 2023 05:25:24 GMT
- Title: GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length
- Authors: Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Chia-Yuan
Chang, Xia Hu
- Abstract summary: This paper introduces a novel, simple, and effective method named growlength'' to accelerate the pretraining process of Large Language Models.
Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency.
- Score: 65.24730341801468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The evolving sophistication and intricacies of Large Language Models (LLMs)
yield unprecedented advancements, yet they simultaneously demand considerable
computational resources and incur significant costs. To alleviate these
challenges, this paper introduces a novel, simple, and effective method named
``\growlength'' to accelerate the pretraining process of LLMs. Our method
progressively increases the training length throughout the pretraining phase,
thereby mitigating computational costs and enhancing efficiency. For instance,
it begins with a sequence length of 128 and progressively extends to 4096. This
approach enables models to process a larger number of tokens within limited
time frames, potentially boosting their performance. In other words, the
efficiency gain is derived from training with shorter sequences optimizing the
utilization of resources. Our extensive experiments with various
state-of-the-art LLMs have revealed that models trained using our method not
only converge more swiftly but also exhibit superior performance metrics
compared to those trained with existing methods. Furthermore, our method for
LLMs pretraining acceleration does not require any additional engineering
efforts, making it a practical solution in the realm of LLMs.
Related papers
- Why Does the Effective Context Length of LLMs Fall Short? [68.34573617977013]
In this work, we introduce ShifTed Rotray position embeddING (STRING)
STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths.
Experimental results show that STRING dramatically improves the performance of the latest large-scale models.
arXiv Detail & Related papers (2024-10-24T13:51:50Z) - Achieving Peak Performance for Large Language Models: A Systematic Review [0.0]
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP)
As models grow into the trillion- parameter range, computational and memory costs increase significantly.
This makes it difficult for many researchers to access the resources needed to train or apply these models.
arXiv Detail & Related papers (2024-09-07T13:57:41Z) - LLMs can Schedule [3.435169201271934]
Job shop scheduling problem (JSSP) remains a significant hurdle in optimizing production processes.
This paper explores the potential of Large Language Models (LLMs) for JSSP.
Surprisingly, our findings demonstrate that LLM-based scheduling can achieve performance comparable to other neural approaches.
arXiv Detail & Related papers (2024-08-13T15:53:58Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Sparsity-Accelerated Training for Large Language Models [20.86225596276327]
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks.
LLMs often require additional training, such as continual pre-training and supervised fine-tuning.
This paper proposes leveraging emphsparsity in pre-trained LLMs to expedite this training process.
arXiv Detail & Related papers (2024-06-03T14:56:09Z) - Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes.
This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism [70.07661254213181]
We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs)
Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting.
Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead.
arXiv Detail & Related papers (2023-12-08T09:31:50Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.