GenFT: A Generative Parameter-Efficient Fine-Tuning Method for Pretrained Foundation Models
- URL: http://arxiv.org/abs/2506.11042v1
- Date: Wed, 21 May 2025 08:33:26 GMT
- Title: GenFT: A Generative Parameter-Efficient Fine-Tuning Method for Pretrained Foundation Models
- Authors: Baoquan Zhang, Guangning Xu, Michael. K. Ng,
- Abstract summary: Generative Fine-Tuning (GenFT) is a novel method that extracts structured, transferable information from $Delta W$ for efficient $Delta W$ training.<n>Experiments on VTAB-1K, FGVC, and GLUE benchmarks demonstrate that GenFT outperforms state-of-the-art PEFT methods.
- Score: 23.85784506399565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained Foundation Models (PFMs) have transformed numerous applications by enabling efficient adaptation to customized tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a resource-efficient alternative to full fine-tuning, especially leveraging reparameterized weights $\Delta W$ to adapt models for downstream tasks. However, a critical yet underexplored question remains: can we utilize well-pretrained weights $W_0$ to guide the update of task-specific $\Delta W$, avoiding inefficient training it from scratch? To end this, we propose Generative Parameter-Efficient Fine-Tuning (GenFT), a novel method that extracts structured, transferable information from $W_0$ for efficient $\Delta W$ training. To extract row and column structure information, GenFT applies row and column transformations to distill essential patterns from $W_0$. A tailored policy further decomposes $\Delta W$ into layer-shared and layer-specific components, balancing information reuse and individualized flexibility. GenFT is simple yet effective, achieving superior performance across CV and NLP tasks. Extensive experiments on VTAB-1K, FGVC, and GLUE benchmarks demonstrate that GenFT outperforms state-of-the-art PEFT methods, offering a new perspective for efficient model adaptation.
Related papers
- VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models [0.8875650122536799]
We introduce VectorFit, a new way of parameterization that efficiently utilizes the existing knowledge embedded in $W$ by adaptively training their singular vectors and biases.<n>We show that utilizing the structural and transformational properties of $W$ in this way can lead to high-rank incremental weight matrices $Delta W$, comparable to that of full fine-tuning.
arXiv Detail & Related papers (2025-03-25T10:36:27Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [35.40065954148091]
FINE is a method based on the Learngene framework to initializing downstream networks leveraging pre-trained models.
It decomposes pre-trained knowledge into the product of matrices (i.e., $U$, $Sigma$, and $V$), where $U$ and $V$ are shared across network blocks as learngenes''
It consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes.
arXiv Detail & Related papers (2024-09-28T08:57:17Z) - Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork.
It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner.
We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning.
The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.