Related papers: Asymmetry in Low-Rank Adapters of Foundation Models

Asymmetry in Low-Rank Adapters of Foundation Models

URL: http://arxiv.org/abs/2402.16842v2
Date: Tue, 27 Feb 2024 18:06:29 GMT
Title: Asymmetry in Low-Rank Adapters of Foundation Models
Authors: Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz S\'aez de Oc\'ariz Borde, Rickard Br\"uel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon
Abstract summary: This paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. We show that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one.
Score: 47.310550805920585
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product $BA$, we observe that the $B$ and $A$ matrices have distinct functions: $A$ extracts features from the input, while $B$ uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound the generalization of low-rank adapters, showing that the parameter savings of exclusively training $B$ improves the bound. We support our conclusions with experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.

Related papers

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment [59.536850459059856]
We introduce MM-RLHF, a dataset containing $mathbf120k$ fine-grained, human-annotated preference comparison pairs. We propose several key innovations to improve the quality of reward models and the efficiency of alignment algorithms. Our approach is rigorously evaluated across $mathbf10$ distinct dimensions and $mathbf27$ benchmarks.
arXiv Detail & Related papers (2025-02-14T18:59:51Z)
Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage. To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices. A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. We propose a novel approach that employs a low rank tensor parametrization for model updates. Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization [27.907707931902547]
We investigate two phenomena related to the attention mechanism during the fine-tuning of Large Language Models. The first phenomenon, termed "Unequal Importance of Attention Matrices," highlights the impact of fine-tuning different weight matrices. The second phenomenon, "Attention Matrices with Customized Learning Rate Leads to Better Convergence," emphasizes the importance of assigning distinct learning rates.
arXiv Detail & Related papers (2024-10-03T06:37:37Z)
CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models [7.108651381160281]
Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models. We propose textbfCoRA: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters.
arXiv Detail & Related papers (2024-08-31T12:48:27Z)
Parameter-Efficient Fine-Tuning via Circular Convolution [29.442868470645482]
Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models. We propose Circular Convolution Adaptation (C$3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization.
arXiv Detail & Related papers (2024-07-27T21:12:46Z)
SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA. Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z)
Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z)
A Single Linear Layer Yields Task-Adapted Low-Rank Matrices [4.695004706877747]
Low-Rank Adaptation (LoRA) is a widely used Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $Delta W$. We show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA.
arXiv Detail & Related papers (2024-03-22T04:38:42Z)
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)
Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.