Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
- URL: http://arxiv.org/abs/2407.01491v1
- Date: Mon, 1 Jul 2024 17:28:59 GMT
- Title: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
- Authors: Siwei Li, Yifan Yang, Yifei Shen, Fangyun Wei, Zongqing Lu, Lili Qiu, Yuqing Yang,
- Abstract summary: LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
- Score: 55.5715496559514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient fine-tuning plays a fundamental role in modern large models, with low-rank adaptation emerging as a particularly promising approach. However, the existing variants of LoRA are hampered by limited expressiveness, a tendency to overfit, and sensitivity to hyperparameter settings. This paper presents LoRA Slow Cascade Learning (LoRASC), an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities while preserving its training efficiency. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns. Additionally, we introduce a slow-fast update mechanism and cascading noisy tuning to bolster generalization. The extensive experiments on various language and vision datasets, as well as robustness benchmarks, demonstrate that the proposed method not only significantly outperforms existing baselines, but also mitigates overfitting, enhances model stability, and improves OOD robustness. Code will be release in https://github.com/microsoft/LoRASC very soon.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models [0.0]
Low-Rank Adaptation (LoRA) has emerged as a promising method to mitigate these issues.
OLoRA significantly accelerates the convergence of LLM training.
OLoRA exhibits improved performance compared to standard LoRA across a variety of language modeling tasks.
arXiv Detail & Related papers (2024-06-03T20:37:27Z) - Rethinking Model Re-Basin and Linear Mode Connectivity [1.1510009152620668]
We decompose re-normalization into rescaling and reshift, uncovering that rescaling plays a crucial role in re-normalization.
We identify that the merged model suffers from the issue of activation collapse and magnitude collapse.
We propose a new perspective to unify the re-basin and pruning, under which a lightweight yet effective post-pruning technique is derived.
arXiv Detail & Related papers (2024-02-05T17:06:26Z) - Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Towards Deeper Deep Reinforcement Learning [42.960199987696306]
In computer vision and natural language processing, state-of-the-art reinforcement learning algorithms often use only small intrinsics.
We show that dataset size is not the limiting factor, and instead argue that instability from the actor in SAC taking gradients through the critic is the culprit.
arXiv Detail & Related papers (2021-06-02T13:41:02Z) - Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models.
By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.