Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
- URL: http://arxiv.org/abs/2405.03003v1
- Date: Sun, 5 May 2024 17:15:24 GMT
- Title: Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
- Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li,
- Abstract summary: Low-rank adaptation(LoRA) has recently gained much interest in fine-tuning foundation models.
We introduce FourierFT, which treats $Delta W$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients.
Our method shows comparable or better performance with fewer parameters than LoRA on various tasks.
- Score: 26.563344030824414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $\Delta W=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $\Delta W$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $\Delta W$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}.
Related papers
- ReFT: Representation Finetuning for Language Models [74.51093640257892]
We develop a family of Representation Finetuning (ReFT) methods.
ReFTs operate on a frozen base model and learn task-specific interventions on hidden representations.
We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, instruction-tuning, and GLUE.
arXiv Detail & Related papers (2024-04-04T17:00:37Z) - Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [92.38964762187477]
We propose LoRAT, a method that unveils the power of larger Vision Transformers (ViT) for tracking within laboratory-level resources.
The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding latency inference.
We design an anchor-free head solely based on a multilayer perceptron (MLP) to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z) - DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.
Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z) - From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models.
Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z) - GIFT: Generative Interpretable Fine-Tuning [8.481707805559589]
We present Generative Interpretable Fine-Tuning (GIFT) for parameter-efficient fine-tuning of pretrained Transformer backbones.
$Theta$ can be shared by all layers selected for fine-tuning, or can be layer-type specific.
We show the output of the first linear layer (i.e., $omegacdot phi$) is surprisingly interpretable.
arXiv Detail & Related papers (2023-12-01T16:33:57Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z) - Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases [73.53227696624306]
We present a new family of algorithms for learning Fourier-sparse set functions.
In contrast to other work that focused on the Walsh-Hadamard transform, our novel algorithms operate with recently introduced non-orthogonal Fourier transforms.
We demonstrate effectiveness on several real-world applications.
arXiv Detail & Related papers (2020-10-01T14:31:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.