Related papers: Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

URL: http://arxiv.org/abs/2305.15212v1
Date: Wed, 24 May 2023 14:51:01 GMT
Title: Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
Authors: Zhen-Ru Zhang, Chuanqi Tan, Haiyang Xu, Chengyu Wang, Jun Huang, Songfang Huang
Abstract summary: We propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism. Experiments on the SuperGLUE and NER datasets show the effectiveness of APT.
Score: 32.84435258519842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning large pre-trained language models on various downstream tasks with whole parameters is prohibitively expensive. Hence, Parameter-efficient fine-tuning has attracted attention that only optimizes a few task-specific parameters with the frozen pre-trained model. In this work, we focus on prefix tuning, which only optimizes continuous prefix vectors (i.e. pseudo tokens) inserted into Transformer layers. Based on the observation that the learned syntax and semantics representation varies a lot at different layers, we argue that the adaptive prefix will be further tailored to each layer than the fixed one, enabling the fine-tuning more effective and efficient. Thus, we propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism. Experiments on the SuperGLUE and NER datasets show the effectiveness of APT. In addition, taking the gate as a probing, we validate the efficiency and effectiveness of the variable prefix.

Related papers

Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning. We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix. Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Prefix Propagation: Parameter-Efficient Tuning for Long Sequences [35.15831629770172]
We propose prefix-propagation, a simple but effective approach that conditions prefixes on previous hidden states. We empirically demonstrate that prefix-propagation outperforms prefix-tuning across long-document tasks. To the best of our knowledge, this work is the first to focus on parameter-efficient learning for long-sequence language tasks.
arXiv Detail & Related papers (2023-05-20T04:07:06Z)
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual. sensuous-aware fine-Tuning (SPT) scheme. SPT allocates trainable parameters to task-specific important positions. Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z)
Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning [53.72897232951918]
We show that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning. We suggest a new variant of prefix-tuning -- textitinducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning.
arXiv Detail & Related papers (2022-10-26T04:39:42Z)
Parameter-Efficient Tuning with Special Token Adaptation [25.37998979962568]
PASTA achieves comparable performance to fine-tuning in natural language understanding tasks. Our work demonstrates the pivotal role of special tokens in pretrained language models.
arXiv Detail & Related papers (2022-10-10T01:02:51Z)
Prefix-Tuning: Optimizing Continuous Prompts for Generation [85.6357778621526]
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
arXiv Detail & Related papers (2021-01-01T08:00:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.