Sparse Gradient Compression for Fine-Tuning Large Language Models
- URL: http://arxiv.org/abs/2502.00311v1
- Date: Sat, 01 Feb 2025 04:18:28 GMT
- Title: Sparse Gradient Compression for Fine-Tuning Large Language Models
- Authors: David H. Yang, Mohammad Mohammadi Amiri, Tejaswini Pedapati, Subhajit Chaudhury, Pin-Yu Chen,
- Abstract summary: Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models.<n>High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size.<n>We propose sparse compression gradient (SGC) to address these limitations.
- Score: 58.44973963468691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. However, the high memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. To address this, parameter efficient fine-tuning (PEFT) methods have been proposed to minimize the number of parameters required for fine-tuning LLMs. However, these approaches often tie the number of optimizer states to dimensions of model parameters, limiting flexibility and control during fine-tuning. In this paper, we propose sparse gradient compression (SGC), a training regime designed to address these limitations. Our approach leverages inherent sparsity in gradients to compress optimizer states by projecting them onto a low-dimensonal subspace, with dimensionality independent of the original model's parameters. By enabling optimizer state updates in an arbitrary low-dimensional subspace, SGC offers a flexible tradeoff between memory efficiency and performance. We demonstrate through experiments that SGC can decrease memory usage in optimizer states more effectively than existing PEFT methods. Furthermore, by fine-tuning LLMs on various downstream tasks, we show that SGC can deliver superior performance while substantially lowering optimizer state memory requirements, particularly in both data-limited and memory-limited settings.
Related papers
- COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs [81.01082659623552]
Large Language Models (LLMs) have demonstrated remarkable success across various domains.
Their optimization remains a significant challenge due to the complex and high-dimensional loss landscapes they inhabit.
arXiv Detail & Related papers (2025-02-24T18:42:19Z) - A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models [22.725326215887435]
We introduce a Randomized Subspace Optimization framework for pre-training and fine-tuning Large Language Models.
Our approach decomposes the high-dimensional training problem into a series of lower-dimensional subproblems.
This structured reduction in dimensionality allows our method to simultaneously reduce memory usage for both activations and states.
arXiv Detail & Related papers (2025-02-11T03:32:10Z) - ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization [26.16200284965289]
Large language models achieve state-of-the-art performance but are costly to fine-tune due to their size.
We propose Ultra-Low-dimensional Prompt Tuning (ULPT), which optimize prompts in a low-dimensional space (e.g., 2D) and use a random but frozen matrix for the up-projection.
Our theoretical analysis shows that random projections can capture high-rank structures effectively, and experimental results demonstrate U's competitive performance over existing parameter-efficient methods.
arXiv Detail & Related papers (2025-02-06T21:00:29Z) - Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase.
Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative.
We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity [66.67596152389591]
Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models.
In this study, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO.
Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance.
arXiv Detail & Related papers (2024-06-05T04:07:35Z) - HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy [55.17502828915191]
We propose a novel-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step.
Our results demonstrate that HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning.
arXiv Detail & Related papers (2024-01-26T21:14:32Z) - QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources [37.265708531464746]
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.
Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements.
We propose QFT, a novel Quantized Full- parameter Tuning framework for LLMs that enables memory-efficient fine-tuning without harming performance.
arXiv Detail & Related papers (2023-10-11T02:47:40Z) - Memory-Efficient Fine-Tuning of Compressed Large Language Models via
sub-4-bit Integer Quantization [27.79783067245817]
Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs.
This paper presents Efficient Adaptation and Quantization-aware (PEQA) - a simple yet effective method that combines the advantages of PEFT with quantized LLMs.
arXiv Detail & Related papers (2023-05-23T15:20:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.