How Does In-Context Learning Help Prompt Tuning?
- URL: http://arxiv.org/abs/2302.11521v1
- Date: Wed, 22 Feb 2023 17:45:12 GMT
- Title: How Does In-Context Learning Help Prompt Tuning?
- Authors: Simeng Sun, Yang Liu, Dan Iter, Chenguang Zhu, Mohit Iyyer
- Abstract summary: Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale.
This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model.
Recently, Singhal et al. (2022) propose instruction prompt tuning'' (IPT), which combines PT with ICL by concatenating a natural language demonstration with learned prompt embeddings.
- Score: 55.78535874154915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning large language models is becoming ever more impractical due to
their rapidly-growing scale. This motivates the use of parameter-efficient
adaptation methods such as prompt tuning (PT), which adds a small number of
tunable embeddings to an otherwise frozen model, and in-context learning (ICL),
in which demonstrations of the task are provided to the model in natural
language without any additional training. Recently, Singhal et al. (2022)
propose ``instruction prompt tuning'' (IPT), which combines PT with ICL by
concatenating a natural language demonstration with learned prompt embeddings.
While all of these methods have proven effective on different tasks, how they
interact with each other remains unexplored. In this paper, we empirically
study when and how in-context examples improve prompt tuning by measuring the
effectiveness of ICL, PT, and IPT on five text generation tasks with multiple
base language models. We observe that (1) IPT does \emph{not} always outperform
PT, and in fact requires the in-context demonstration to be semantically
similar to the test input to yield improvements; (2) PT is unstable and
exhibits high variance, but combining PT and ICL (into IPT) consistently
reduces variance across all five tasks; and (3) prompts learned for a specific
source task via PT exhibit positive transfer when paired with in-context
examples of a different target task. Our results offer actionable insights on
choosing a suitable parameter-efficient adaptation method for a given task.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning [14.975436239088312]
We propose DePT, which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates.
We demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios.
arXiv Detail & Related papers (2023-09-11T00:02:05Z) - Learning to Initialize: Can Meta Learning Improve Cross-task
Generalization in Prompt Tuning? [37.522581151997734]
Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, has shown remarkable performance in few-shot learning.
We study meta prompt tuning (MPT) to explore how meta-learning can help improve (if it can) cross-task generalization.
arXiv Detail & Related papers (2023-02-16T08:37:22Z) - SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning [28.29889045842277]
Multitask prompted learning can help generalization through a diverse set of tasks at once.
We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning.
arXiv Detail & Related papers (2022-12-21T11:18:09Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Making Pre-trained Language Models End-to-end Few-shot Learners with
Contrastive Prompt Tuning [41.15017636192417]
We present CP-Tuning, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning Language Models.
It is integrated with the task-invariant continuous prompt encoding technique with fully trainable prompt parameters.
Experiments over a variety of language understanding tasks used in IR systems and different PLMs show that CP-Tuning outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-01T02:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.