Related papers: Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

URL: http://arxiv.org/abs/2405.03003v1
Date: Sun, 5 May 2024 17:15:24 GMT
Title: Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li,
Abstract summary: Low-rank adaptation(LoRA) has recently gained much interest in fine-tuning foundation models. We introduce FourierFT, which treats $Delta W$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. Our method shows comparable or better performance with fewer parameters than LoRA on various tasks.
Score: 26.563344030824414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $\Delta W=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $\Delta W$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $\Delta W$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}.

Related papers

Block Circulant Adapter for Large Language Models [10.353352027807272]
Fine-tuning large language models (LLMs) is difficult due to their huge model size. Recent Fourier domain-based methods show potential for reducing fine-tuning costs. We propose a block circulant-based matrix fine-tuning method with a stable training to leverage the properties of circulant matrices.
arXiv Detail & Related papers (2025-05-01T15:14:32Z)
Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace [3.7049613588433497]
Fine-tuning large language models (LLM) for various downstream tasks has become a new paradigm. Low-Rank Adaptation (LoRA) is well-known for its parameter efficiency. We propose a new method for. Efficient decomposition- dubbed as DCFT- via deconvolution in subspace.
arXiv Detail & Related papers (2025-03-03T11:15:50Z)
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models [68.55148272295916]
We propose IntLoRA, to push the efficiency limits by using integer type (INT) low-rank parameters to adapt the quantized diffusion models. IntLoRA offers three key advantages: (i) for fine-tuning, the pre-trained weights are quantized, reducing memory usage; (ii) for storage, both pre-trained and low-rank weights are in INT which consumes less disk space; (iii) for inference, IntLoRA weights can be naturally merged into quantized pre-trained weights through efficient integer multiplication or bit-shifting.
arXiv Detail & Related papers (2024-10-29T05:50:17Z)
Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform [10.565509997395504]
We propose a novel Selective Discrete Cosine Transformation (sDCTFT) fine-tuning scheme to push this frontier. Its general idea is to exploit the superior energy compaction and decorrelation properties of DCT. Experiments on four benchmark datasets demonstrate the superior accuracy, reduced computational cost, and lower storage requirements.
arXiv Detail & Related papers (2024-10-09T16:07:42Z)
LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning. We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z)
Parameter-Efficient Fine-Tuning via Circular Convolution [29.442868470645482]
Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models. We propose Circular Convolution Adaptation (C$3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization.
arXiv Detail & Related papers (2024-07-27T21:12:46Z)
ReFT: Representation Finetuning for Language Models [74.51093640257892]
We develop a family of Representation Finetuning (ReFT) methods. ReFTs operate on a frozen base model and learn task-specific interventions on hidden representations. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, instruction-tuning, and GLUE.
arXiv Detail & Related papers (2024-04-04T17:00:37Z)
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources. The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency. We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models. Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z)
LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases [73.53227696624306]
We present a new family of algorithms for learning Fourier-sparse set functions. In contrast to other work that focused on the Walsh-Hadamard transform, our novel algorithms operate with recently introduced non-orthogonal Fourier transforms. We demonstrate effectiveness on several real-world applications.
arXiv Detail & Related papers (2020-10-01T14:31:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.