Related papers: Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models

Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models

URL: http://arxiv.org/abs/2508.19564v1
Date: Wed, 27 Aug 2025 04:46:56 GMT
Title: Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
Authors: Yuhang Liu, Tao Li, Zhehao Huang, Zuopeng Yang, Xiaolin Huang,
Abstract summary: Sharpness-Aware Minimization (SAM) has proven effective in improving generalization by seeking flat minima.<n>We propose Bi-directional Low-Rank Adaptation (Bi-LoRA), which introduces an auxiliary LoRA module to model SAM's adversarial weight perturbations.<n>Bi-LoRA captures broader sharpness for achieving flatter minima while remaining memory-efficient.
Score: 33.28146211296799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning large-scale pre-trained models with limited data presents significant challenges for generalization. While Sharpness-Aware Minimization (SAM) has proven effective in improving generalization by seeking flat minima, its substantial extra memory and computation overhead make it impractical for large models. Integrating SAM with parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) is a promising direction. However, we find that directly applying SAM to LoRA parameters limits the sharpness optimization to a restricted subspace, hindering its effectiveness. To address this limitation, we propose Bi-directional Low-Rank Adaptation (Bi-LoRA), which introduces an auxiliary LoRA module to model SAM's adversarial weight perturbations. It decouples SAM's weight perturbations from LoRA optimization: the primary LoRA module adapts to specific tasks via standard gradient descent, while the auxiliary module captures the sharpness of the loss landscape through gradient ascent. Such dual-module design enables Bi-LoRA to capture broader sharpness for achieving flatter minima while remaining memory-efficient. Another important benefit is that the dual design allows for simultaneous optimization and perturbation, eliminating SAM's doubled training costs. Extensive experiments across diverse tasks and architectures demonstrate Bi-LoRA's efficiency and effectiveness in enhancing generalization.

Related papers

Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models [14.755143405057929]
Fine-tuning large vision models (LVMs) and large language models (LLMs) under differentially private learning (DPFL) is hindered by a fundamental privacy-utility trade-off.<n>Low-Rank Adaptation (LoRA), a promising parameter-efficient fine-tuning (PEFT) method, reduces computational and communication costs by introducing two trainable low-rank matrices while freezing pre-trained weights.<n>We propose LA-LoRA, a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints.
arXiv Detail & Related papers (2026-02-23T15:05:28Z)
Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning [52.63618112418439]
Sharpness-aware computation (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning.<n>We propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers.
arXiv Detail & Related papers (2026-02-10T04:05:43Z)
EFlat-LoRA: Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond [21.19636109010622]
We propose Flat-LoRA and its efficient version i.e., EFlat-LoRA, to seek flat minima for low-rank adaptation (LoRA)<n>We show that EFlat-LoRA achieves optimize efficiency comparable to that of LoRA while simultaneously attaining comparable or even better performance.
arXiv Detail & Related papers (2025-08-01T10:59:49Z)
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture [9.451914483640605]
Low-rank adaptation (LoRA) is a parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks.<n>To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method.<n> Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score.
arXiv Detail & Related papers (2025-02-27T04:20:47Z)
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment [20.382810396966473]
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs)<n>Current methods optimize LoRA by initializing with static singular value decomposition subsets, leading to suboptimal leveraging of pre-trained knowledge.<n>We propose underlineGreat LunderlineoRunderlineA Mixture-of-Experunderlinet (GOAT)<n>GOAT integrates relevant priors using an SVD-structured MoE, and aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor
arXiv Detail & Related papers (2025-02-24T06:48:13Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.<n>We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks.<n>Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks.<n>We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z)
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement [5.162783756846019]
Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning.<n>Low-Rank Adaptation (LoRA) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters.<n>LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2024-11-22T14:19:01Z)
LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
arXiv Detail & Related papers (2024-10-28T17:14:01Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.