Related papers: Sparsity May Be All You Need: Sparse Random Parameter Adaptation

Sparsity May Be All You Need: Sparse Random Parameter Adaptation

URL: http://arxiv.org/abs/2502.15975v1
Date: Fri, 21 Feb 2025 22:23:16 GMT
Title: Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Authors: Jesus Rios, Pierre Dognin, Ronny Luss, Karthikeyan N. Ramamurthy,
Abstract summary: Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size.<n>We propose reducing the number of trainable parameters by randomly selecting a small proportion of the model parameters to train on.
Score: 7.269130161558109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size. Parameter-Efficient Fine-Tuning (PEFT) methods aim at significantly reducing the computational and memory resources needed for fine-tuning these models by only training on a small number of parameters instead of all model parameters. Currently, the most popular PEFT method is the Low-Rank Adaptation (LoRA), which freezes the parameters of the model to be fine-tuned and introduces a small set of trainable parameters in the form of low-rank matrices. We propose simply reducing the number of trainable parameters by randomly selecting a small proportion of the model parameters to train on. In this paper, we compare the efficiency and performance of our proposed approach with PEFT methods, including LoRA, as well as full parameter fine-tuning.

Related papers

FineGates: LLMs Finetuning with Compression using Stochastic Gates [7.093692674858257]
Large Language Models (LLMs) present significant challenges for full finetuning due to the high computational demands.<n>Lightweight finetuning techniques have been proposed, like learning low-rank adapter layers.<n>We propose an adaptor model based on gates that simultaneously sparsify the frozen base model with task-specific adaptation.
arXiv Detail & Related papers (2024-12-17T14:33:05Z)
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models [14.762222323897978]
We propose a novel parameter-efficient training (PET) method for large language models. Unlike prior methods, this subset is not fixed in location but rather which parameters are modified over the course of training. Our method enables a seamless scaling of the subset size across an arbitrary proportion of the total model size.
arXiv Detail & Related papers (2024-11-13T13:53:10Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models. We propose a novel model fine-tuning method to make full use of these ineffective parameters. Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models [18.877891285367216]
We introduce $textID3$, a novel selective PEFT method that calculates parameter importance continually.<n>We analytically show that $textID3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency.
arXiv Detail & Related papers (2024-08-26T17:58:53Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z)
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional. We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank. Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z)
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models [20.5908375260123]
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. We present LoRETTA, a framework that significantly reduces trainable parameters through tensor-train decomposition. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100times$ fewer parameters on the LLaMA-2-7B models.
arXiv Detail & Related papers (2024-02-18T01:20:00Z)
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning [112.97430455461097]
We propose a general PEFT method that tunes a mixture of adaptation modules introduced in each Transformer layer while keeping most of the PLM weights frozen. By only tuning 0.1-0.2% of PLM parameters, we show that AdaMix outperforms SOTA parameter-efficient fine-tuning and full model fine-tuning for both NLU and NLG tasks.
arXiv Detail & Related papers (2022-10-31T16:23:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.