LoRA Training in the NTK Regime has No Spurious Local Minima
- URL: http://arxiv.org/abs/2402.11867v3
- Date: Tue, 28 May 2024 07:05:05 GMT
- Title: LoRA Training in the NTK Regime has No Spurious Local Minima
- Authors: Uijeong Jang, Jason D. Lee, Ernest K. Ryu,
- Abstract summary: Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models.
We theoretically analyze LoRA fine-tuning in the neural tangent kernel regime with $N$ data points.
- Score: 46.46792977614938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.
Related papers
- LoRA-GGPO: Mitigating Double Descent in LoRA Fine-Tuning via Gradient-Guided Perturbation Optimization [12.504723188498]
Large Language Models (LLMs) have achieved remarkable success in natural language processing.
Low-Rank Adaptation (LoRA) has emerged as a practical solution by approximating parameter updates with low-rank matrices.
LoRA-GGPO is a novel method that leverages gradient and weight norms to generate targeted perturbations.
arXiv Detail & Related papers (2025-02-20T13:14:41Z) - BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.
We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z) - LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail) [15.381439594872898]
Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models.
We show that LoRA training converges to a global minimizer with low rank and small magnitude.
We argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region.
arXiv Detail & Related papers (2025-02-13T14:45:11Z) - ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws [14.17396731469533]
Low-Rank Adaptation (LoRA) is the bread and butter of Large Language Model finetuning.
We identify three core limitations to LoRA for finetuning--a setting that employs limited amount of data and training steps.
We find an elegant solution: a Dropout-free, scaling-free, LoRA with Adaptive Learning rate--coined ALLoRA.
arXiv Detail & Related papers (2024-10-13T01:57:38Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.
Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.
We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z) - ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA)
Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA.
The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z) - DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.
Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z) - The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models.
This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA.
For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.