Related papers: Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-Tuning: Optimizing Continuous Prompts for Generation

URL: http://arxiv.org/abs/2101.00190v1
Date: Fri, 1 Jan 2021 08:00:36 GMT
Title: Prefix-Tuning: Optimizing Continuous Prompts for Generation
Authors: Xiang Lisa Li and Percy Liang
Abstract summary: Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
Score: 85.6357778621526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training.

Related papers

Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning. We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix. Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z)
PIP: Parse-Instructed Prefix for Syntactically Controlled Paraphrase Generation [61.05254852400895]
Parse-Instructed Prefix (PIP) is a novel adaptation of prefix-tuning to tune large pre-trained language models. In contrast to traditional fine-tuning methods for this task, PIP is a compute-efficient alternative with 10 times less learnable parameters.
arXiv Detail & Related papers (2023-05-26T07:42:38Z)
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning [32.84435258519842]
We propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism. Experiments on the SuperGLUE and NER datasets show the effectiveness of APT.
arXiv Detail & Related papers (2023-05-24T14:51:01Z)
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models. We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z)
Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning [53.72897232951918]
We show that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning. We suggest a new variant of prefix-tuning -- textitinducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning.
arXiv Detail & Related papers (2022-10-26T04:39:42Z)
PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion [108.8941541255567]
This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion. Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed. We show that, by tuning far fewer parameters than finetuning, LMs transfer non-trivially to most tasks and reach competitiveness with prior state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-25T02:22:29Z)
BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models. We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Unfreeze with Care: Space-Efficient Fine-Tuning of Semantic Parsing Models [5.893781742558463]
We examine two promising techniques, prefix tuning and bias-term tuning, specifically on semantic parsing. We compare them against each other on two different semantic parsing datasets, and we also compare them against full and partial fine-tuning, both in few-shot and conventional data settings. While prefix tuning is shown to do poorly for semantic parsing tasks off the shelf, we modify it by adding special token embeddings, which results in very strong performance without compromising parameter savings.
arXiv Detail & Related papers (2022-03-05T04:30:03Z)
Discourse-Aware Prompt Design for Text Generation [13.835916386769474]
We show that prompt based conditional text generation can be improved with simple and efficient methods. First, we show that a higher-level discourse structure of human written text can be modelled with textithierarchical blocking on prefix parameters. Second, we propose sparse prefix tuning by introducing textitattention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
arXiv Detail & Related papers (2021-12-10T18:15:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.