Related papers: Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment

Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment

URL: http://arxiv.org/abs/2406.06564v1
Date: Mon, 3 Jun 2024 05:40:34 GMT
Title: Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment
Authors: Kaiye Zhou, Shucheng Wang,
Abstract summary: We introduce a novel parameter-efficient training technique that frequently alters trainable part of parameters, facilitating effective pre-training. Our method achieves memory reductions and computational overhead comparable to current state-of-the-art parameter-efficient algorithms during the pre-training phase.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the era of large language models, the demand for efficient use of computational resources has become critically important. Although parameter-efficient fine-tuning techniques have achieved results comparable to full fine-tuning, their application during the pre-training phase poses significant challenges. Specifically, employing parameter-efficient strategies at the onset of pre-training can severely compromise efficiency, especially in larger models. In this paper, building upon the fine-tuning method LoRA, we introduce a novel parameter-efficient training technique that frequently alters trainable part of parameters, facilitating effective pre-training. Our method not only achieves memory reductions and computational overhead comparable to current state-of-the-art parameter-efficient algorithms during the pre-training phase but also maintains accuracy levels comparable to those of full pre-training. We provide both theoretical analyses and empirical evidence to demonstrate the effectiveness of our approach.

Related papers

LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning [5.980897761790243]
We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning.<n>LoFT aligns the model's internal dynamics with those of updating all model weights.<n> Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning.
arXiv Detail & Related papers (2025-05-27T14:54:24Z)
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation [9.445321300673909]
Low-rank adaption (LoRA) is a prominent method that adds a small number of learnable parameters to the frozen pre-trained weights for fine-tuning. In this work, we introduce Progressive Compression LoRA(PC-LoRA), which simultaneously perform model compression and fine-tuning.
arXiv Detail & Related papers (2024-06-13T13:44:31Z)
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [133.45193150403537]
Training Large Language Models (LLMs) presents significant memory challenges due to the growing size of weights and GPU states. In this work, we propose Gradient Low-Rank Projection (GaLore) as a memory-efficient training strategy. Our 8-bit GaLore further reduces memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline.
arXiv Detail & Related papers (2024-03-06T07:29:57Z)
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional. We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank. Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models [45.72323731094864]
Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method. In this work, we study the enhancement of LoRA training by introducing an $r times r$ preconditioner in each gradient step.
arXiv Detail & Related papers (2024-02-04T05:05:43Z)
Run LoRA Run: Faster and Lighter LoRA Implementations [50.347242693025336]
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This paper presents the RunLoRA framework for efficient implementations of LoRA. Experiments show up to 28% speedup on language modeling networks.
arXiv Detail & Related papers (2023-12-06T10:54:34Z)
Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)
ReLoRA: High-Rank Training Through Low-Rank Updates [14.606961537327345]
We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. ReLoRA saves up to 5.5Gb of RAM per GPU and improves training speed by 9-40% depending on the model size and hardware setup.
arXiv Detail & Related papers (2023-07-11T18:02:09Z)
LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.