Related papers: Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

URL: http://arxiv.org/abs/2406.02913v1
Date: Wed, 5 Jun 2024 04:07:35 GMT
Title: Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity
Authors: Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu,
Abstract summary: Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models. In this study, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance.
Score: 66.67596152389591
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO fine-tuning of LLMs. Specifically, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. This approach allows the majority of un-tuned parameters to be quantized to accommodate the constraint of limited device memory. Our findings reveal that the pre-training process can identify a set of "sensitive parameters" that can guide the ZO fine-tuning of LLMs on downstream tasks. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance, while offering wall-clock time speedup. Additionally, we show that ZO fine-tuning targeting these 0.1% sensitive parameters, combined with 4 bit quantization, enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8 GiB of memory and notably reduced latency.

Related papers

ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization [26.16200284965289]
Large language models achieve state-of-the-art performance but are costly to fine-tune due to their size. We propose Ultra-Low-dimensional Prompt Tuning (ULPT), which optimize prompts in a low-dimensional space (e.g., 2D) and use a random but frozen matrix for the up-projection. Our theoretical analysis shows that random projections can capture high-rank structures effectively, and experimental results demonstrate U's competitive performance over existing parameter-efficient methods.
arXiv Detail & Related papers (2025-02-06T21:00:29Z)
Sparse Gradient Compression for Fine-Tuning Large Language Models [58.44973963468691]
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models. High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size. We propose sparse compression gradient (SGC) to address these limitations.
arXiv Detail & Related papers (2025-02-01T04:18:28Z)
Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for Fine-Tuning Large Language Models [33.911521719528686]
Fine-tuning is powerful for adapting large language models to downstream tasks, but it often results in huge memory usages. A promising approach is using Zeroth-Order (ZO) gradients, which estimates to replace First-Order (FO) gradients. We introduce a novel layer-wise sparse computation and memory efficient ZO, named LeZO.
arXiv Detail & Related papers (2024-10-13T12:47:37Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase. Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative. We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning [67.44661423463927]
This paper introduces Sparse MeZO, a memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We show that Sparse-MeZO consistently improves both performance and convergence speed over MeZO without any overhead.
arXiv Detail & Related papers (2024-02-24T07:22:04Z)
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy [55.17502828915191]
We propose a novel-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. Our results demonstrate that HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning.
arXiv Detail & Related papers (2024-01-26T21:14:32Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources [37.265708531464746]
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks. Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements. We propose QFT, a novel Quantized Full- parameter Tuning framework for LLMs that enables memory-efficient fine-tuning without harming performance.
arXiv Detail & Related papers (2023-10-11T02:47:40Z)
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression [76.73007709690306]
We introduce the Sparse-Quantized Representation (SpQR), a new compressed format and quantization technique. SpQR achieves relative accuracy losses of less than 1% in perplexity for highly-accurate LLaMA and Falcon LLMs. This makes it possible to run 33B parameter LLM on a single 24 GB consumer GPU without any performance degradation at 15% speedup.
arXiv Detail & Related papers (2023-06-05T17:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.