Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation
- URL: http://arxiv.org/abs/2506.05713v2
- Date: Sun, 27 Jul 2025 08:06:57 GMT
- Title: Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation
- Authors: Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma, Yu Zhang, Ying Wei,
- Abstract summary: Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models.<n>We propose CoTo pruning, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning.
- Score: 21.137278840000366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter's marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.
Related papers
- Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners [1.2461503242570642]
This paper presents a study on adapting the Contrastive Captioners (CoCa) visual backbone for few-shot image classification.<n>We identify an "augmentation divergence": while strong data augmentation degrades the performance of linear probing in low-shot settings, it is essential for stabilizing LoRA fine-tuning.<n>We also demonstrate that hybrid objectives incorporating Supervised Contrastive (SupCon) loss yield consistent performance improvements over standard Cross-Entropy.
arXiv Detail & Related papers (2025-12-14T20:13:21Z) - Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z) - AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition [41.654675205772485]
We propose a vision-language fine-tuning framework based on cross-layer tensor ring decomposition (TRD) with the integration and collaboration of diverse adapters, called AdaRing.<n>Our experiments show that the proposed AdaRing achieves the state-of-the-art performance while reducing average training parameters by 90%.
arXiv Detail & Related papers (2025-08-16T01:56:27Z) - Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts [72.22148263683037]
We study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures.<n>First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature.<n>Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks.
arXiv Detail & Related papers (2025-07-09T03:25:45Z) - Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z) - DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation [44.99833362998488]
Decoupled Low-rank Adaptation (DeLoRA) is a novel finetuning method that normalizes and scales learnable low-rank matrices.<n>We show that DeLoRA matches or surpasses performance of competing PEFT methods, while exhibiting stronger robustness.
arXiv Detail & Related papers (2025-03-23T22:00:56Z) - OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.<n>They often pose optimization challenges, with poor convergence.<n>We introduce an over- parameterized approach that accelerates training without increasing inference costs.<n>We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z) - ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers [37.77593687901923]
ASLoRA is a cross-layer parameter-sharing strategy combining global sharing with partial adaptive sharing.<n>We conduct experiments on various NLP tasks, showing that ASLoRA outperforms LoRA while using less than 25% of the parameters.
arXiv Detail & Related papers (2024-12-13T13:32:13Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - Towards Optimal Adapter Placement for Efficient Transfer Learning [73.1149084352343]
PETL aims to adapt pre-trained models to new downstream tasks while minimizing the number of fine-tuned parameters.
adapters, a popular approach in PETL, inject additional capacity into existing networks by incorporating low-rank projections.
This paper investigates the relationship between the placement of an adapter and its performance.
arXiv Detail & Related papers (2024-10-21T10:37:17Z) - Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z) - MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning [20.68925288222065]
Mixture of Sparse Adapters, or MoSA, is a novel Adapter Tuning method.
MoSA can achieve significantly better performance than standard without any additional computational storage overhead.
MoSA consistently outperforms other Adapter Tuning methods as well as other baselines by a large margin.
arXiv Detail & Related papers (2023-12-05T17:50:55Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Conditional Adapters: Parameter-efficient Transfer Learning with Fast
Inference [53.65845680932835]
Conditional Adapter (CoDA) is a parameter-efficient transfer learning method that also improves inference efficiency.
Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up.
arXiv Detail & Related papers (2023-04-11T03:17:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.