AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient   Fine-Tuning of Large Models
        - URL: http://arxiv.org/abs/2403.13269v3
- Date: Tue, 16 Apr 2024 17:37:12 GMT
- Title: AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient   Fine-Tuning of Large Models
- Authors: Zeyu Liu, Souvik Kundu, Anni Li, Junrui Wan, Lianghao Jiang, Peter Anthony Beerel, 
- Abstract summary: We present a novel.
-Efficient Fine-Tuning (PEFT) method, dubbed as Adaptive Freezing of Low Rank Adaptation (AFLoRA)
Specifically, we add a parallel path of trainable low-rank matrices, namely a down-projection and an up-projection matrix, each of which is followed by a feature transformation vector.
Our experimental results demonstrate that we can achieve state-of-the-art performance with an average improvement of up to $0.85%$ as evaluated on GLUE benchmark.
- Score: 5.981614673186146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We present a novel Parameter-Efficient Fine-Tuning (PEFT) method, dubbed as Adaptive Freezing of Low Rank Adaptation (AFLoRA). Specifically, for each pre-trained frozen weight tensor, we add a parallel path of trainable low-rank matrices, namely a down-projection and an up-projection matrix, each of which is followed by a feature transformation vector. Based on a novel freezing score, we the incrementally freeze these projection matrices during fine-tuning to reduce the computation and alleviate over-fitting. Our experimental results demonstrate that we can achieve state-of-the-art performance with an average improvement of up to $0.85\%$ as evaluated on GLUE benchmark while yeilding up to $9.5\times$ fewer average trainable parameters. While compared in terms of runtime, AFLoRA can yield up to $1.86\times$ improvement as opposed to similar PEFT alternatives. Besides the practical utility of our approach, we provide insights on the trainability requirements of LoRA paths at different modules and the freezing schedule for the different projection matrices. Code will be released. 
 
      
        Related papers
        - TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models [0.135975510645475]
 TLoRA is a novel tri-matrix low-rank adaptation method.
We show that TLoRA achieves comparable performance to existing low-rank methods.
 arXiv  Detail & Related papers  (2025-04-25T23:11:10Z)
- VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained   Foundation Models [0.8875650122536799]
 We introduce VectorFit, a new way of parameterization that efficiently utilizes the existing knowledge embedded in $W$ by adaptively training their singular vectors and biases.<n>We show that utilizing the structural and transformational properties of $W$ in this way can lead to high-rank incremental weight matrices $Delta W$, comparable to that of full fine-tuning.
 arXiv  Detail & Related papers  (2025-03-25T10:36:27Z)
- DiffoRA: Enabling Parameter-Efficient Fine-Tuning via Differential   Module Selection [32.369133126167085]
 Low-Rank Adaptation (LoRA) has gained popularity for its streamlined design by incorporating low-rank matrices into existing pre-trained models.<n>We propose DiffoRA, which enables adaptive adoption of the low-rank decomposition matrices.
 arXiv  Detail & Related papers  (2025-02-13T02:41:34Z)
- ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
 ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
 arXiv  Detail & Related papers  (2024-12-11T12:31:30Z)
- LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models [21.889177019111525]
 Training large models with millions or even billions of parameters from scratch incurs substantial computational costs.
We use Low-Rank Adaptation (LoRA) to adapt only a reduced number of parameters to specific tasks with gradient-baseds.
We propose robust approaches that work well across a vast range of well-established computer vision and language models.
 arXiv  Detail & Related papers  (2024-10-15T12:41:31Z)
- Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
 As language models grow in size, memory demands for backpropagation increase.
Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative.
We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
 arXiv  Detail & Related papers  (2024-10-11T17:01:43Z)
- LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
 Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
 arXiv  Detail & Related papers  (2024-10-05T06:59:50Z)
- NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
 Fine-tuning pre-trained models often yields state-of-the-art performance but is computationally expensive when updating all parameters.
We propose NEAT, a nonlinear PEFT approach that employs a lightweight neural network to learn a nonlinear transformation of the pre-trained weights.
Our theoretical analysis shows that NEAT achieves greater efficiency than LoRA while maintaining equivalent expressivity.
 arXiv  Detail & Related papers  (2024-10-02T17:29:23Z)
- Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
 We propose a novel spectrum-aware adaptation framework for generative models.
Our method adjusts both singular values and their basis vectors of pretrained weights.
We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
 arXiv  Detail & Related papers  (2024-05-31T17:43:35Z)
- SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors [80.6043267994434]
 We propose SVFT, a simple approach that fundamentally differs from existing methods.
 SVFT updates (W) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations.
Experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters.
 arXiv  Detail & Related papers  (2024-05-30T01:27:43Z)
- AffineQuant: Affine Transformation Quantization for Large Language   Models [58.45460102764]
 Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its compression efficiency and cost-effectiveness in the context of training.
Existing PTQ methods for Large-scale Language Models (LLMs) limit the optimization scope to scaling transformations between pre- and post-quantization weights.
In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant)
 arXiv  Detail & Related papers  (2024-03-19T08:40:21Z)
- Flora: Low-Rank Adapters Are Secretly Gradient Compressors [30.224822087562163]
 Low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters.
LoRA restricts overall weight update matrices to be low-rank, limiting the model performance.
We propose Flora, which is able to achieve high-rank updates by resampling the projection matrices.
 arXiv  Detail & Related papers  (2024-02-05T18:50:39Z)
- Generative Parameter-Efficient Fine-Tuning [8.481707805559589]
 GIFT learns to generate the fine-tuned weights for a layer directly from its pretrained weights.
We show this formulation bridges parameter-efficient fine-tuning and representation fine-tuning.
 arXiv  Detail & Related papers  (2023-12-01T16:33:57Z)
- AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
  Optimization [104.96004056928474]
 We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
 arXiv  Detail & Related papers  (2021-06-30T14:47:09Z)
- Bayesian Sparse learning with preconditioned stochastic gradient MCMC
  and its applications [5.660384137948734]
 The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
 arXiv  Detail & Related papers  (2020-06-29T20:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.