Related papers: LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

URL: http://arxiv.org/abs/2308.03303v1
Date: Mon, 7 Aug 2023 05:12:27 GMT
Title: LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Authors: Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li
Abstract summary: We present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA.
Score: 19.08716369943138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation memory to update low-rank weights. Reducing the number of LoRA layers or using activation recomputation could harm the fine-tuning performance or increase the computational overhead. In this work, we present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. LoRA-FA chooses to freeze the projection-down weight of $A$ and update the projection-up weight of $B$ in each LoRA layer. It ensures the change of model weight reside in a low-rank space during LLMs fine-tuning, while eliminating the requirement to store full-rank input activations. We conduct extensive experiments across multiple model types (RoBERTa, T5, LLaMA) and model scales. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. Furthermore, LoRA-FA can reduce the overall memory cost by up to 1.4$\times$ compared to LoRA.

Related papers

Activated LoRA: Fine-tuned LLMs for Intrinsics [9.503174205896533]
Low-Rank Adaptation (LoRA) has emerged as a highly efficient framework for finetuning the weights of large foundation models. We propose Activated LoRA (aLoRA), which modifies the LoRA framework to only adapt weights for the tokens in the sequence emphafter the aLoRA is invoked. This change crucially allows aLoRA to accept the base model's KV cache of the input string, meaning that aLoRA can be instantly activated whenever needed in a chain.
arXiv Detail & Related papers (2025-04-16T18:03:21Z)
Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity. LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning. LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization [38.23587031169402]
We propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B.
arXiv Detail & Related papers (2024-07-10T20:52:18Z)
LoRA Learns Less and Forgets Less [25.09261710396838]
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. We compare the performance of LoRA and full finetuning on two target domains, programming and mathematics.
arXiv Detail & Related papers (2024-05-15T19:27:45Z)
ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA) Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z)
PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization [39.30090456724925]
Supervised fine-tuning is the most common method to adapt large language models (LLMs) to downstream tasks. Full fine-tuning requires massive computational resources. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dimensional.
arXiv Detail & Related papers (2024-02-25T16:43:41Z)
LoRA+: Efficient Low Rank Adaptation of Large Models [13.074320303580361]
We show that Low Rank Adaptation (LoRA) leads to suboptimal finetuning of models with large width (embedding dimension) We then show that this suboptimality of LoRA can be corrected simply by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio. In our experiments, LoRA$+$ improves performance (1-2 $%$ improvements) and finetuning speed (up to $sim$ 2X SpeedUp) at the same computational cost as LoRA.
arXiv Detail & Related papers (2024-02-19T18:33:49Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm. We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z)
S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks. We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.