Related papers: Parameter-Efficient Fine-Tuning via Circular Convolution

Parameter-Efficient Fine-Tuning via Circular Convolution

URL: http://arxiv.org/abs/2407.19342v2
Date: Wed, 21 Aug 2024 05:44:11 GMT
Title: Parameter-Efficient Fine-Tuning via Circular Convolution
Authors: Aochuan Chen, Jiashun Cheng, Zijing Liu, Ziqi Gao, Fugee Tsung, Yu Li, Jia Li,
Abstract summary: Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models. We propose Circular Convolution Adaptation (C$3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization.
Score: 29.442868470645482
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models, leveraging low-rank matrices $\mathbf{A}$ and $\mathbf{B}$ to represent weight changes (i.e., $\Delta \mathbf{W} = \mathbf{B} \mathbf{A}$). This method reduces trainable parameters and mitigates heavy memory consumption associated with full delta matrices by sequentially multiplying $\mathbf{A}$ and $\mathbf{B}$ with the activation. Despite its success, the intrinsic low-rank characteristic may limit its performance. Although several variants have been proposed to address this issue, they often overlook the crucial computational and memory efficiency brought by LoRA. In this paper, we propose Circular Convolution Adaptation (C$^3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization. Extensive experiments demonstrate that C$^3$A consistently outperforms LoRA and its variants across various fine-tuning tasks.

Related papers

Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization [27.907707931902547]
We investigate two phenomena related to the attention mechanism during the fine-tuning of Large Language Models. The first phenomenon, termed "Unequal Importance of Attention Matrices," highlights the impact of fine-tuning different weight matrices. The second phenomenon, "Attention Matrices with Customized Learning Rate Leads to Better Convergence," emphasizes the importance of assigning distinct learning rates.
arXiv Detail & Related papers (2024-10-03T06:37:37Z)
CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models [7.108651381160281]
Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models. We propose textbfCoRA: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters.
arXiv Detail & Related papers (2024-08-31T12:48:27Z)
LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning. We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z)
SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA. Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z)
Asymmetry in Low-Rank Adapters of Foundation Models [47.310550805920585]
This paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. We show that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one.
arXiv Detail & Related papers (2024-02-26T18:59:12Z)
Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters. We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z)
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks [93.00280593719513]
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches. In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch. Our algorithm achieves regret bounds comparable to those in fully sequential setting with only $mathcalO( log T)$ batches.
arXiv Detail & Related papers (2023-11-22T06:06:54Z)
Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices [27.693028578653394]
Delta-LoRA is a novel parameter-efficient approach to fine-tune large language models (LLMs) In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $bA$ and $bB$, but also propagate the learning to the pre-trained weights $bW$.
arXiv Detail & Related papers (2023-09-05T17:40:34Z)
Monarch: Expressive Structured Matrices for Efficient and Accurate Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones. We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.