Asymmetry in Low-Rank Adapters of Foundation Models
- URL: http://arxiv.org/abs/2402.16842v2
- Date: Tue, 27 Feb 2024 18:06:29 GMT
- Title: Asymmetry in Low-Rank Adapters of Foundation Models
- Authors: Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz S\'aez de
Oc\'ariz Borde, Rickard Br\"uel Gabrielsson, Leshem Choshen, Marzyeh
Ghassemi, Mikhail Yurochkin, Justin Solomon
- Abstract summary: This paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices.
We show that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one.
- Score: 47.310550805920585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient fine-tuning optimizes large, pre-trained foundation
models by updating a subset of parameters; in this class, Low-Rank Adaptation
(LoRA) is particularly effective. Inspired by an effort to investigate the
different roles of LoRA matrices during fine-tuning, this paper characterizes
and leverages unexpected asymmetry in the importance of low-rank adapter
matrices. Specifically, when updating the parameter matrices of a neural
network by adding a product $BA$, we observe that the $B$ and $A$ matrices have
distinct functions: $A$ extracts features from the input, while $B$ uses these
features to create the desired output. Based on this observation, we
demonstrate that fine-tuning $B$ is inherently more effective than fine-tuning
$A$, and that a random untrained $A$ should perform nearly as well as a
fine-tuned one. Using an information-theoretic lens, we also bound the
generalization of low-rank adapters, showing that the parameter savings of
exclusively training $B$ improves the bound. We support our conclusions with
experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.
Related papers
- Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage.
To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices.
A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models [7.108651381160281]
Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models.
We propose textbfCoRA: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models.
Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters.
arXiv Detail & Related papers (2024-08-31T12:48:27Z) - Parameter-Efficient Fine-Tuning via Circular Convolution [29.442868470645482]
Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models.
We propose Circular Convolution Adaptation (C$3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization.
arXiv Detail & Related papers (2024-07-27T21:12:46Z) - SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models.
SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA.
Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z) - Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model.
Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z) - A Single Linear Layer Yields Task-Adapted Low-Rank Matrices [4.695004706877747]
Low-Rank Adaptation (LoRA) is a widely used Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $Delta W$.
We show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA.
arXiv Detail & Related papers (2024-03-22T04:38:42Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.