Related papers: GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

URL: http://arxiv.org/abs/2310.00576v1
Date: Sun, 1 Oct 2023 05:25:24 GMT
Title: GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Authors: Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Chia-Yuan Chang, Xia Hu
Abstract summary: This paper introduces a novel, simple, and effective method named growlength'' to accelerate the pretraining process of Large Language Models. Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency.
Score: 65.24730341801468
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The evolving sophistication and intricacies of Large Language Models (LLMs) yield unprecedented advancements, yet they simultaneously demand considerable computational resources and incur significant costs. To alleviate these challenges, this paper introduces a novel, simple, and effective method named ``\growlength'' to accelerate the pretraining process of LLMs. Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency. For instance, it begins with a sequence length of 128 and progressively extends to 4096. This approach enables models to process a larger number of tokens within limited time frames, potentially boosting their performance. In other words, the efficiency gain is derived from training with shorter sequences optimizing the utilization of resources. Our extensive experiments with various state-of-the-art LLMs have revealed that models trained using our method not only converge more swiftly but also exhibit superior performance metrics compared to those trained with existing methods. Furthermore, our method for LLMs pretraining acceleration does not require any additional engineering efforts, making it a practical solution in the realm of LLMs.

Related papers

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models [28.253786579346432]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP) Currently solutions toward long context modeling often employ multi-stage continual pertaining. In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position.
arXiv Detail & Related papers (2024-12-10T04:09:29Z)
On the Effectiveness of Incremental Training of Large Language Models [10.39475177812483]
We investigate the effectiveness of incremental training for large language models. We find that incremental layer-wise training may not be a viable alternative for training large language models.
arXiv Detail & Related papers (2024-11-27T19:11:49Z)
Why Does the Effective Context Length of LLMs Fall Short? [68.34573617977013]
In this work, we introduce ShifTed Rotray position embeddING (STRING) STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that STRING dramatically improves the performance of the latest large-scale models.
arXiv Detail & Related papers (2024-10-24T13:51:50Z)
Achieving Peak Performance for Large Language Models: A Systematic Review [0.0]
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) As models grow into the trillion- parameter range, computational and memory costs increase significantly. This makes it difficult for many researchers to access the resources needed to train or apply these models.
arXiv Detail & Related papers (2024-09-07T13:57:41Z)
LLMs can Schedule [3.435169201271934]
Job shop scheduling problem (JSSP) remains a significant hurdle in optimizing production processes. This paper explores the potential of Large Language Models (LLMs) for JSSP. Surprisingly, our findings demonstrate that LLM-based scheduling can achieve performance comparable to other neural approaches.
arXiv Detail & Related papers (2024-08-13T15:53:58Z)
Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training. We propose three effective strategies to enhance LLM performance within a fixed compute budget. Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z)
Sparsity-Accelerated Training for Large Language Models [20.86225596276327]
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks. LLMs often require additional training, such as continual pre-training and supervised fine-tuning. This paper proposes leveraging emphsparsity in pre-trained LLMs to expedite this training process.
arXiv Detail & Related papers (2024-06-03T14:56:09Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism [70.07661254213181]
We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs) Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead.
arXiv Detail & Related papers (2023-12-08T09:31:50Z)
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.