Prefix-Tuning: Optimizing Continuous Prompts for Generation
- URL: http://arxiv.org/abs/2101.00190v1
- Date: Fri, 1 Jan 2021 08:00:36 GMT
- Title: Prefix-Tuning: Optimizing Continuous Prompts for Generation
- Authors: Xiang Lisa Li and Percy Liang
- Abstract summary: Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.
We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks.
We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
- Score: 85.6357778621526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning is the de facto way to leverage large pretrained language models
to perform downstream tasks. However, it modifies all the language model
parameters and therefore necessitates storing a full copy for each task. In
this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning
for natural language generation tasks, which keeps language model parameters
frozen, but optimizes a small continuous task-specific vector (called the
prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent
tokens to attend to this prefix as if it were "virtual tokens". We apply
prefix-tuning to GPT-2 for table-to-text generation and to BART for
summarization. We find that by learning only 0.1\% of the parameters,
prefix-tuning obtains comparable performance in the full data setting,
outperforms fine-tuning in low-data settings, and extrapolates better to
examples with topics unseen during training.
Related papers
- Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning.
We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix.
Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z) - PIP: Parse-Instructed Prefix for Syntactically Controlled Paraphrase
Generation [61.05254852400895]
Parse-Instructed Prefix (PIP) is a novel adaptation of prefix-tuning to tune large pre-trained language models.
In contrast to traditional fine-tuning methods for this task, PIP is a compute-efficient alternative with 10 times less learnable parameters.
arXiv Detail & Related papers (2023-05-26T07:42:38Z) - Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model
Fine-tuning [32.84435258519842]
We propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism.
Experiments on the SuperGLUE and NER datasets show the effectiveness of APT.
arXiv Detail & Related papers (2023-05-24T14:51:01Z) - Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning [53.72897232951918]
We show that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning.
We suggest a new variant of prefix-tuning -- textitinducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning.
arXiv Detail & Related papers (2022-10-26T04:39:42Z) - PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph
Completion [108.8941541255567]
This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion.
Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed.
We show that, by tuning far fewer parameters than finetuning, LMs transfer non-trivially to most tasks and reach competitiveness with prior state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-25T02:22:29Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Unfreeze with Care: Space-Efficient Fine-Tuning of Semantic Parsing
Models [5.893781742558463]
We examine two promising techniques, prefix tuning and bias-term tuning, specifically on semantic parsing.
We compare them against each other on two different semantic parsing datasets, and we also compare them against full and partial fine-tuning, both in few-shot and conventional data settings.
While prefix tuning is shown to do poorly for semantic parsing tasks off the shelf, we modify it by adding special token embeddings, which results in very strong performance without compromising parameter savings.
arXiv Detail & Related papers (2022-03-05T04:30:03Z) - Discourse-Aware Prompt Design for Text Generation [13.835916386769474]
We show that prompt based conditional text generation can be improved with simple and efficient methods.
First, we show that a higher-level discourse structure of human written text can be modelled with textithierarchical blocking on prefix parameters.
Second, we propose sparse prefix tuning by introducing textitattention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
arXiv Detail & Related papers (2021-12-10T18:15:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.