Expanding Sparse Tuning for Low Memory Usage
- URL: http://arxiv.org/abs/2411.01800v1
- Date: Mon, 04 Nov 2024 04:58:20 GMT
- Title: Expanding Sparse Tuning for Low Memory Usage
- Authors: Shufan Shen, Junshu Sun, Xiangyang Ji, Qingming Huang, Shuhui Wang,
- Abstract summary: We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage.
To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices.
A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
- Score: 103.43560327427647
- License:
- Abstract: Parameter-efficient fine-tuning (PEFT) is an effective method for adapting pre-trained vision models to downstream tasks by tuning a small subset of parameters. Among PEFT methods, sparse tuning achieves superior performance by only adjusting the weights most relevant to downstream tasks, rather than densely tuning the whole weight matrix. However, this performance improvement has been accompanied by increases in memory usage, which stems from two factors, i.e., the storage of the whole weight matrix as learnable parameters in the optimizer and the additional storage of tunable weight indexes. In this paper, we propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage. To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices, saving from the costly storage of the whole original matrix. A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes. To maintain the effectiveness of sparse tuning with low-rank matrices, we extend the low-rank decomposition by applying nonlinear kernel functions to the whole-matrix merging. Consequently, we gain an increase in the rank of the merged matrix, enhancing the ability of SNELL in adapting the pre-trained models to downstream tasks. Extensive experiments on multiple downstream tasks show that SNELL achieves state-of-the-art performance with low memory usage, endowing PEFT with sparse tuning to large-scale models. Codes are available at https://github.com/ssfgunner/SNELL.
Related papers
- ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization [26.16200284965289]
Large language models achieve state-of-the-art performance but are costly to fine-tune due to their size.
We propose Ultra-Low-dimensional Prompt Tuning (ULPT), which optimize prompts in a low-dimensional space (e.g., 2D) and use a random but frozen matrix for the up-projection.
Our theoretical analysis shows that random projections can capture high-rank structures effectively, and experimental results demonstrate U's competitive performance over existing parameter-efficient methods.
arXiv Detail & Related papers (2025-02-06T21:00:29Z) - Sparse Gradient Compression for Fine-Tuning Large Language Models [58.44973963468691]
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models.
High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size.
We propose sparse compression gradient (SGC) to address these limitations.
arXiv Detail & Related papers (2025-02-01T04:18:28Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.
We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.
Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research.
Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration.
Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z) - Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters.
We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values.
We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z) - SPT: Fine-Tuning Transformer-based Language Models Efficiently with
Sparsification [14.559316921646356]
Fine-tuning Transformer-based models for downstream tasks has long running time and high memory consumption.
We propose the SPT system to fine-tune Transformer-based models efficiently by introducing sparsity.
SPT consistently outperforms well-optimized baselines, reducing the peak memory consumption by up to 50% and accelerating fine-tuning by up to 2.2x.
arXiv Detail & Related papers (2023-12-16T07:44:52Z) - Hyperparameter Optimization for Large Language Model Instruction-Tuning [6.743825167463901]
We study the whole pipeline of performing fine-tuning and validation on a pre-trained LLM as a blackbox.
We efficiently explore the space of hyper parameters with the nomad algorithm, achieving a boost in performance and human alignment of the tuned model.
arXiv Detail & Related papers (2023-12-01T22:03:12Z) - Frustratingly Simple Memory Efficiency for Pre-trained Language Models
via Dynamic Embedding Pruning [42.652021176354644]
The memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings.
We propose a simple yet effective approach that leverages this finding to minimize the memory footprint of the embedding matrix.
We show that this approach provides substantial reductions in memory usage across a wide range of models and tasks.
arXiv Detail & Related papers (2023-09-15T19:00:00Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.