Related papers: NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models

NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models

URL: http://arxiv.org/abs/2502.14482v1
Date: Thu, 20 Feb 2025 12:01:11 GMT
Title: NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models
Authors: Chenlu Guo, Yuan Wu, Yi Chang,
Abstract summary: We introduce StructuredLoRA (SLoRA), which investigates adding a small intermediate matrix between the low-rank matrices A and B.<n> Secondly, we propose Nystr"omLoRA (NLoRA), which leverages Nystr"om-based initialization for SLoRA to improve its effectiveness and efficiency.<n>Finally, we propose IntermediateTune (IntTune), which explores fine-tuning exclusively on the intermediate matrix of NLoRA to further boost LLM efficiency.
Score: 12.431575579432458
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Parameter-efficient fine-tuning (PEFT) is essential for adapting large language models (LLMs), with low-rank adaptation (LoRA) being the most popular approach. However, LoRA suffers from slow convergence, and some recent LoRA variants, such as PiSSA, primarily rely on Singular Value Decomposition (SVD) for initialization, leading to expensive computation. To mitigate these problems, we use the Nystr\"om method, which follows a three-matrix manipulation. We first introduce StructuredLoRA (SLoRA), which investigates adding a small intermediate matrix between the low-rank matrices A and B. Secondly, we propose Nystr\"omLoRA (NLoRA), which leverages Nystr\"om-based initialization for SLoRA to improve its effectiveness and efficiency. Finally, we propose IntermediateTune (IntTune), which explores fine-tuning exclusively on the intermediate matrix of NLoRA to further boost LLM efficiency. We evaluate our methods on five natural language generation (NLG) tasks and eight natural language understanding (NLU) tasks. On GSM8K, SLoRA and NLoRA achieve accuracies of 56.48% and 57.70%, surpassing LoRA by 33.52% and 36.41%, with only 3.67 million additional trainable parameters. IntTune improves average NLG performance over LoRA by 7.45% while using only 1.25% of its parameters. These results demonstrate the efficiency and effectiveness of our approach in enhancing model performance with minimal parameter overhead.

Related papers

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation [85.89510825889168]
We introduce LoRA-Pre, a novel low-rank system for efficient pre-training.<n>LoRA-Pre decomposing the momentum matrix into a compact low-rank subspace within the online linear learner.<n>We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family.
arXiv Detail & Related papers (2026-02-27T18:57:06Z)
Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC [50.36542772932594]
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights.<n>In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning.<n>We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent.
arXiv Detail & Related papers (2026-02-18T13:41:41Z)
Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates [14.49537642990529]
Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT)<n>We propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA.<n>We show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.
arXiv Detail & Related papers (2025-12-03T03:14:09Z)
Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update [50.36542772932594]
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights.<n>There is still a gap between full training with low-rank projections (SVDLoRA) and LoRA fine-tuning, indicating that LoRA steps can be further improved.
arXiv Detail & Related papers (2025-09-24T10:32:50Z)
SingLoRA: Low Rank Adaptation Using a Single Matrix [7.828928639229988]
Low-Rank Adaptation (LoRA) has significantly advanced parameter-efficient fine-tuning of large pretrained models.<n>We propose SingLoRA, which reformulates low-rank adaptation by learning the weights update as a decomposition of a single low-rank matrix multiplied by its transpose.
arXiv Detail & Related papers (2025-07-08T01:11:30Z)
Uni-LoRA: One Vector is All You Need [13.938834666101679]
Low-Rank Adaptation (LoRA) has become the de facto parameter-efficient fine-tuning (PEFT) method for large language models.<n>In this paper, we show that the parameter space reduction strategies employed by these LoRA variants can be formulated within a unified framework.<n>Under the unified view of Uni-LoRA, this design requires only a single trainable vector to reconstruct LoRA parameters for the entire LLM.
arXiv Detail & Related papers (2025-06-01T03:00:09Z)
DenseLoRA: Dense Low-Rank Adaptation of Large Language Models [14.133511131962786]
Low-rank adaptation (LoRA) has been developed as an efficient approach for adapting large language models (LLMs)<n>We introduce Dense Low-Rank Adaptation (DenseLoRA), a novel approach that enhances parameter efficiency while achieving superior performance compared to LoRA.<n>We evaluate DenseLoRA on various benchmarks, showing that it achieves 83.8% accuracy with only 0.01% of trainable parameters, compared to LoRA's 80.8% accuracy with 0.70% of trainable parameters on LLaMA3-8B.
arXiv Detail & Related papers (2025-05-27T08:19:07Z)
Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity.<n>LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning.<n>LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.<n>We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
LoRA-Mini : Adaptation Matrices Decomposition and Selective Training [2.0670689746336]
Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters. We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts. This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA.
arXiv Detail & Related papers (2024-11-24T12:21:14Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models [27.757883818520217]
Nested Low-Rank Adaptation (NoRA) is a novel approach to parameter-efficient fine-tuning. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space.
arXiv Detail & Related papers (2024-08-18T12:18:56Z)
LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning. We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters [11.23006032094776]
We introduce LoRA-XS, a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance. LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA.
arXiv Detail & Related papers (2024-05-27T19:07:13Z)
ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA) Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z)
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm. We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z)
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.