EFlat-LoRA: Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
- URL: http://arxiv.org/abs/2508.00522v1
- Date: Fri, 01 Aug 2025 10:59:49 GMT
- Title: EFlat-LoRA: Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
- Authors: Jiaxin Deng, Qingcheng Zhu, Junbiao Pang, Linlin Yang, Zhongqian Fu, Baochang Zhang,
- Abstract summary: We propose Flat-LoRA and its efficient version i.e., EFlat-LoRA, to seek flat minima for low-rank adaptation (LoRA)<n>We show that EFlat-LoRA achieves optimize efficiency comparable to that of LoRA while simultaneously attaining comparable or even better performance.
- Score: 21.19636109010622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Little research explores the correlation between the expressive ability and generalization ability of the low-rank adaptation (LoRA). Sharpness-Aware Minimization (SAM) improves model generalization for both Convolutional Neural Networks (CNNs) and Transformers by encouraging convergence to locally flat minima. However, the connection between sharpness and generalization has not been fully explored for LoRA due to the lack of tools to either empirically seek flat minima or develop theoretical methods. In this work, we propose Flat-LoRA and its efficient version i.e., EFlat-LoRA, to seek flat minima for LoRA. Concretely, we theoretically demonstrate that perturbations in the full parameter space can be transferred to the low-rank subspace. This approach eliminates the potential interference introduced by perturbations across multiple matrices in the low-rank subspace. Our extensive experiments on large language models and vision-language models demonstrate that EFlat-LoRA achieves optimize efficiency comparable to that of LoRA while simultaneously attaining comparable or even better performance. For example, on the GLUE dataset with RoBERTa-large, EFlat-LoRA outperforms LoRA and full fine-tuning by 1.0% and 0.5% on average, respectively. On vision-language models e.g., Qwen-VL-Chat shows performance improvements of 1.5% and 1.0% on SQA and VizWiz datasets, respectively. These empirical results also verify that the generalization of LoRA is closely related to sharpness, which is omitted by previous methods.
Related papers
- LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization [12.504723188498]
Low-Rank Adaptation (LoRA) methods exhibit "double descent" in training loss as increases.<n>LoRA-MGPO is a novel LoRA-based framework incorporating Momentum-Guided Perturbation optimization (MGPO)<n> Experiments on natural language understanding and generation benchmarks demonstrate that LoRA-MGPO outperforms LoRA and state-of-the-art PEFT methods.
arXiv Detail & Related papers (2025-02-20T13:14:41Z) - BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.<n>We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z) - SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks.<n>Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks.<n>We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z) - GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning [2.7446241148152253]
Fine-tuning large language models (LLMs) is computationally intensive because it requires updating all parameters.<n>Low-Rank Adaptation (LoRA) improves efficiency by modifying only a subset of weights but introduces a trade-off between expressivity and computational cost.<n>We propose Geometric Low-Rank Adaptation (GeLoRA), a novel framework that computes the intrinsic dimensionality of hidden state representations to adaptively select LoRA ranks.
arXiv Detail & Related papers (2024-12-12T13:04:54Z) - LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement [5.162783756846019]
Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning.<n>Low-Rank Adaptation (LoRA) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters.<n>LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2024-11-22T14:19:01Z) - Exploring Gradient Subspaces: Addressing and Overcoming LoRA's Limitations in Federated Fine-Tuning of Large Language Models [19.533062623518674]
This paper critically analyzes the convergence and performance guarantees of popular FL frameworks utilizing Low-Rank Adaptation (LoRA)<n>We demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models.<n>Our findings show that GaLore along with direct-weight aggregation is a more effective approach, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities.
arXiv Detail & Related papers (2024-10-30T15:23:44Z) - LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
arXiv Detail & Related papers (2024-10-28T17:14:01Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.<n>Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.<n>We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z) - The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models.
This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA.
For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.