Related papers: LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)

LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)

URL: http://arxiv.org/abs/2502.09376v2
Date: Fri, 14 Feb 2025 02:39:04 GMT
Title: LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)
Authors: Junsu Kim, Jaeyeon Kim, Ernest K. Ryu,
Abstract summary: Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models.<n>We show that LoRA training converges to a global minimizer with low rank and small magnitude.<n>We argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region.
Score: 15.381439594872898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA's training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a ``special regime'', which includes idealized setups where linearization arguments hold, and a ``generic regime'' representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank, small-magnitude region of the parameter space -- where global minima lie -- thus shedding light on why LoRA training usually succeeds in finding global minima.

Related papers

SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization [2.594346658179846]
Low-Rank Adaptation (LoRA) constrains updates to a fixed low-rank subspace.<n>We introduce Subspace Recomposition in Low-Rank Adaptation (SRLoRA) via importance-based fusion and reinitialization.<n> SRLoRA consistently achieves faster convergence and improved accuracy over standard LoRA.
arXiv Detail & Related papers (2025-05-18T14:12:40Z)
LoRA-GGPO: Mitigating Double Descent in LoRA Fine-Tuning via Gradient-Guided Perturbation Optimization [12.504723188498]
Large Language Models (LLMs) have achieved remarkable success in natural language processing. Low-Rank Adaptation (LoRA) has emerged as a practical solution by approximating parameter updates with low-rank matrices. LoRA-GGPO is a novel method that leverages gradient and weight norms to generate targeted perturbations.
arXiv Detail & Related papers (2025-02-20T13:14:41Z)
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks. Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks. We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z)
LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
arXiv Detail & Related papers (2024-10-28T17:14:01Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
LoRA Learns Less and Forgets Less [25.09261710396838]
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. We compare the performance of LoRA and full finetuning on two target domains, programming and mathematics.
arXiv Detail & Related papers (2024-05-15T19:27:45Z)
PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization [39.30090456724925]
Supervised fine-tuning is the most common method to adapt large language models (LLMs) to downstream tasks. Full fine-tuning requires massive computational resources. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dimensional.
arXiv Detail & Related papers (2024-02-25T16:43:41Z)
LoRA Training in the NTK Regime has No Spurious Local Minima [46.46792977614938]
Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models. We theoretically analyze LoRA fine-tuning in the neural tangent kernel regime with $N$ data points.
arXiv Detail & Related papers (2024-02-19T06:22:09Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)
The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models. This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.