Full Parameter Fine-tuning for Large Language Models with Limited Resources
- URL: http://arxiv.org/abs/2306.09782v2
- Date: Thu, 6 Jun 2024 13:22:26 GMT
- Title: Full Parameter Fine-tuning for Large Language Models with Limited Resources
- Authors: Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, Xipeng Qiu,
- Abstract summary: Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training.
We propose a new computation, LOw-Memory Optimization (LOMO), which fuses the gradient and the parameter update in one step to reduce memory usage.
- Score: 55.794732214059806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting both academia and society. While existing approaches have focused on parameter-efficient fine-tuning, which tunes or adds a small number of parameters, few have addressed the challenge of tuning the full parameters of LLMs with limited resources. In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. By integrating LOMO with existing memory saving techniques, we reduce memory usage to 10.8% compared to the standard approach (DeepSpeed solution). Consequently, our approach enables the full parameter fine-tuning of a 65B model on a single machine with 8 RTX 3090, each with 24GB memory.Code and data are available at https://github.com/OpenLMLab/LOMO.
Related papers
- Scalable MatMul-free Language Modeling [8.672867887354977]
We show that MatMul operations can be completely eliminated from large language models.
Our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers.
arXiv Detail & Related papers (2024-06-04T17:50:34Z) - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [133.45193150403537]
Training Large Language Models (LLMs) presents significant memory challenges due to the growing size of weights and GPU states.
In this work, we propose Gradient Low-Rank Projection (GaLore) as a memory-efficient training strategy.
Our 8-bit GaLore further reduces memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline.
arXiv Detail & Related papers (2024-03-06T07:29:57Z) - Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters.
We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values.
We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z) - HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy [55.17502828915191]
We propose a novel-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step.
Our results demonstrate that HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning.
arXiv Detail & Related papers (2024-01-26T21:14:32Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a
Single GPU [4.198627205271621]
We present ASPEN, a framework for fine-tuning transformer-based large language models (LLMs)
ASPEN efficiently trains multiple jobs on a single GPU using the LoRA method, leveraging shared pre-trained model and adaptive scheduling.
Experiments show that ASPEN saves 53% of GPU memory when training multiple LLaMA-7B models on NVIDIA A100 80GB GPU.
arXiv Detail & Related papers (2023-12-05T05:38:38Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources [37.265708531464746]
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.
Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements.
We propose QFT, a novel Quantized Full- parameter Tuning framework for LLMs that enables memory-efficient fine-tuning without harming performance.
arXiv Detail & Related papers (2023-10-11T02:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.