Continued Pretraining for Better Zero- and Few-Shot Promptability
- URL: http://arxiv.org/abs/2210.10258v2
- Date: Fri, 21 Oct 2022 01:24:29 GMT
- Title: Continued Pretraining for Better Zero- and Few-Shot Promptability
- Authors: Zhaofeng Wu, Robert L. Logan IV, Pete Walsh, Akshita Bhagia, Dirk
Groeneveld, Sameer Singh, Iz Beltagy
- Abstract summary: We show that a simple recipe, continued pretraining that incorporates a trainable prompt during multi-task learning, leads to improved promptability in both zero- and few-shot settings.
On the other hand, continued pretraining using MAML-style meta-learning, a method that directly optimize few-shot promptability, yields subpar performance.
- Score: 44.381944544918014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently introduced language model prompting methods can achieve high
accuracy in zero- and few-shot settings while requiring few to no learned
task-specific parameters. Nevertheless, these methods still often trail behind
full model finetuning. In this work, we investigate if a dedicated continued
pretraining stage could improve "promptability", i.e., zero-shot performance
with natural language prompts or few-shot performance with prompt tuning. We
reveal settings where existing continued pretraining methods lack
promptability. We also identify current methodological gaps, which we fill with
thorough large-scale experiments. We demonstrate that a simple recipe,
continued pretraining that incorporates a trainable prompt during multi-task
learning, leads to improved promptability in both zero- and few-shot settings
compared to existing methods, up to 31% relative. On the other hand, we find
that continued pretraining using MAML-style meta-learning, a method that
directly optimizes few-shot promptability, yields subpar performance. We
validate our findings with two prompt tuning methods, and, based on our
results, we provide concrete recommendations to optimize promptability for
different use cases.
Related papers
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Progressive Prompts: Continual Learning for Language Models [38.80713056417491]
We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models.
Progressive Prompts learns a new soft prompt for each task and sequentially resists it with the previously learned prompts.
Experiments on standard continual learning benchmarks show that our approach outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T00:17:38Z) - TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA)
In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge.
Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z) - Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning [83.10861551885321]
We present Multi-task Pre-trained Modular Prompt (MP2) to boost prompt tuning for few-shot learning.
MP2 is a set of combinable prompts pre-trained on 38 Chinese tasks.
We show MP2 significantly outperforms prompt tuning, full model tuning, and prior prompt pre-training methods in few-shot settings.
arXiv Detail & Related papers (2022-10-14T06:43:42Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - DualPrompt: Complementary Prompting for Rehearsal-free Continual
Learning [39.53513975439818]
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting.
We present DualPrompt, which learns a tiny set of parameters, called prompts, to instruct a pre-trained model to learn tasks arriving sequentially.
With extensive experimental validation, DualPrompt consistently sets state-of-the-art performance under the challenging class-incremental setting.
arXiv Detail & Related papers (2022-04-10T23:36:55Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - PPT: Pre-trained Prompt Tuning for Few-shot Learning [47.05554619258627]
Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks.
Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks.
In our work, we find that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings.
arXiv Detail & Related papers (2021-09-09T15:11:04Z) - Making Pre-trained Language Models Better Few-shot Learners [11.90626040104822]
Recent GPT-3 model achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context.
Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient.
We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.
arXiv Detail & Related papers (2020-12-31T17:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.