Related papers: Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

URL: http://arxiv.org/abs/2408.14470v2
Date: Tue, 27 Aug 2024 03:56:11 GMT
Title: Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models
Authors: Aradhye Agarwal, Suhas K Ramesh, Ayan Sengupta, Tanmoy Chakraborty,
Abstract summary: A class of parameter-efficient fine-tuning (PEFT) aims to mitigate computational challenges by selectively fine-tuning only a small fraction of the model parameters. We introduce $textID3$, a novel selective PEFT method that calculates parameter importance continually and dynamically unmasks parameters. We analytically show that $textID3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency.
Score: 18.877891285367216
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning large language models (LLMs) on downstream tasks requires substantial computational resources. A class of parameter-efficient fine-tuning (PEFT) aims to mitigate these computational challenges by selectively fine-tuning only a small fraction of the model parameters. Although computationally efficient, these techniques often fail to match the performance of fully fine-tuned models, primarily due to inherent biases introduced during parameter selection. Traditional selective PEFT techniques use a fixed set of parameters based on a predefined budget (a process also known as unmasking), failing to capture parameter importance dynamically and often ending up exceeding the budget. We introduce $\text{ID}^3$, a novel selective PEFT method that calculates parameter importance continually and dynamically unmasks parameters by balancing exploration and exploitation in parameter selection. Our empirical study on 15 tasks spanning natural language understanding and generative tasks demonstrates the effectiveness of our method compared to fixed-masking-based PEFT techniques. We analytically show that $\text{ID}^3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency. $\text{ID}^3$ is robust to random initialization of neurons and, therefore, can be seamlessly integrated into existing additive and reparametrization-based PEFT modules such as adapters and LoRA for dynamic sparsification.

Related papers

High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning [57.85676271833619]
Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning.<n>We present textbfSMoA, a high-rank textbfStructured textbfMOdulation textbfAdapter that uses fewer trainable parameters while maintaining a higher rank.
arXiv Detail & Related papers (2026-01-12T13:06:17Z)
TS-PEFT: Token-Selective Parameter-Efficient Fine-Tuning with Learnable Threshold Gating [8.102270371993411]
We introduce a new paradigm called Token-Selective PEFT (TS-PEFT), in which a function S selectively applies PEFT modifications to a subset of position indices.<n>Our experimental results reveal that the indiscriminate application of PEFT to all indices is not only superfluous, but may also be counterproductive.
arXiv Detail & Related papers (2025-11-20T08:41:20Z)
FPS: Feedforward-based Parameter Selection For Efficient Fine-Tuning [10.272085741815921]
We propose Feedforward-based Selection (FPS) for fine-tuning pre-trained models.<n>FPS ranks parameters by the product of their magnitudes and corresponding input activations, leveraging both pre-trained knowledge and downstream data.<n>FPS achieves performance comparable to state-of-the-art methods while reducing peak memory usage by nearly $9 times$.
arXiv Detail & Related papers (2025-10-31T10:44:16Z)
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.362388367152256]
Large language models (LLMs) are used to fine-tune a parameter-efficient version of Code Llama using LoRA. Our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead.
arXiv Detail & Related papers (2025-04-08T13:15:47Z)
Sparsity May Be All You Need: Sparse Random Parameter Adaptation [7.269130161558109]
Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size. We propose reducing the number of trainable parameters by randomly selecting a small proportion of the model parameters to train on.
arXiv Detail & Related papers (2025-02-21T22:23:16Z)
Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model. We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies. Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. We propose a novel approach that employs a low rank tensor parametrization for model updates. Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models [63.52035708182815]
We introduce a novel Budget-guided Iterative search strategy for automatic PEFT (BIPEFT) BIPEFT employs a new iterative search strategy to disentangle the binary module and rank dimension search spaces. Extensive experiments on public benchmarks demonstrate the superior performance of BIPEFT for downstream tasks with a low parameter budget.
arXiv Detail & Related papers (2024-10-04T18:50:46Z)
Propulsion: Steering LLM with Tiny Fine-Tuning [0.0]
We propose Propulsion, a novel parameter efficient fine-tuning (PEFT) method to optimize task-specific performance. Inspired by the concept of controlled adjustments in physical motion, Propulsion selectively re-scales specific dimensions of a pre-trained model. Our theoretical analysis, supported by Neural Tangent Kernel (NTK) theory, shows that Propulsion approximates the performance of full fine-tuning with far fewer trainable parameters.
arXiv Detail & Related papers (2024-09-17T06:51:59Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z)
LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models [7.926974917872204]
LoRA-SP is a novel approach utilizing randomized half-selective parameter freezing. LoRA-SP significantly reduces computational and memory requirements without compromising model performance.
arXiv Detail & Related papers (2024-02-28T06:50:10Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)
Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation. Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z)
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual. sensuous-aware fine-Tuning (SPT) scheme. SPT allocates trainable parameters to task-specific important positions. Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.