GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
- URL: http://arxiv.org/abs/2412.19820v1
- Date: Sun, 15 Dec 2024 12:28:13 GMT
- Title: GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
- Authors: Xutao Liao, Shaohui Li, Yuhui Xu, Zhi Li, Yu Liu, You He,
- Abstract summary: We propose GaLore$+$, which uses cross-head low-rank projection to reduce the substantial time consumption in estimating low-rank projections for multi-head attention.
Our experiments demonstrate that GaLore$+$ delivers superior performance while achieving approximately $4times$ fine-tuning speed compared to vanilla GaLore.
- Score: 17.33732087380253
- License:
- Abstract: Recent low-rank training methods, such as GaLore, have significantly reduced the memory required to optimize large language models (LLMs). However, these methods often suffer from time-consuming low-rank projection estimations. In particular, the singular value decomposition (SVD) in GaLore can consume more than 80\% of the total training time. To address this issue, we propose GaLore$+$, which uses cross-head low-rank projection to reduce the substantial time consumption in estimating low-rank projections for multi-head attention. In addition, we employ randomized subspace iteration to achieve fast SVD. To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates. We evaluate GaLore$+$ on arithmetic reasoning and natural language generation datasets. Our experiments demonstrate that GaLore$+$ delivers superior performance while achieving approximately $4\times$ fine-tuning speed compared to vanilla GaLore.
Related papers
- Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity.
LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning.
LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z) - SubTrack your Grad: Gradient Subspace Tracking for Memory and Time Efficient Full-Parameter LLM Training [6.057289837472806]
Training Large Language Models (LLMs) demand significant time and computational resources due to their large model sizes and states.
We propose SubTrack-Grad, a subspace tracking-based optimization method that efficiently tracks the evolving gradient subspace.
SubTrack-Grad delivers better or on-par results compared to GaLore, while significantly outperforming BAdam.
arXiv Detail & Related papers (2025-02-03T18:13:52Z) - COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection [11.655821671462427]
We present COAP, a memory-efficient method that minimizes computational overhead while maintaining training performance.
For LLaMA-1B, it reduces memory by 61% with only 2% additional time cost, achieving the same PPL as AdamW.
With 8-bit quantization, COAP cuts memory by 81% and 4x speedup over GaLore for LLaVA-v1.5-7B fine-tuning, while delivering higher accuracy.
arXiv Detail & Related papers (2024-11-26T03:50:52Z) - Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning [1.3597551064547502]
GaLore allows full-supervised learning while being more memory-efficient.
This work introduces Natural GaLore, which efficiently applies the inverse Empirical Fisher Information Matrix to low-rank gradients.
arXiv Detail & Related papers (2024-10-21T14:05:06Z) - Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase.
Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative.
We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z) - Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients [86.40635601953446]
We introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection.
We demonstrate that Q-Galore achieves highly competitive performance with exceptional memory efficiency.
arXiv Detail & Related papers (2024-07-11T08:42:58Z) - OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning [18.102930806071978]
Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore) is a memory-efficient fine-tuning approach.
OwLore consistently outperforms baseline approaches, including full fine-tuning.
arXiv Detail & Related papers (2024-05-28T17:22:22Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [133.45193150403537]
Training Large Language Models (LLMs) presents significant memory challenges due to the growing size of weights and GPU states.
In this work, we propose Gradient Low-Rank Projection (GaLore) as a memory-efficient training strategy.
Our 8-bit GaLore further reduces memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline.
arXiv Detail & Related papers (2024-03-06T07:29:57Z) - Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters.
We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values.
We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.