Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing
- URL: http://arxiv.org/abs/2107.13586v1
- Date: Wed, 28 Jul 2021 18:09:46 GMT
- Title: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing
- Authors: Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi,
Graham Neubig
- Abstract summary: This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning"
Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
- Score: 78.8500633981247
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper surveys and organizes research works in a new paradigm in natural
language processing, which we dub "prompt-based learning". Unlike traditional
supervised learning, which trains a model to take in an input x and predict an
output y as P(y|x), prompt-based learning is based on language models that
model the probability of text directly. To use these models to perform
prediction tasks, the original input x is modified using a template into a
textual string prompt x' that has some unfilled slots, and then the language
model is used to probabilistically fill the unfilled information to obtain a
final string x, from which the final output y can be derived. This framework is
powerful and attractive for a number of reasons: it allows the language model
to be pre-trained on massive amounts of raw text, and by defining a new
prompting function the model is able to perform few-shot or even zero-shot
learning, adapting to new scenarios with few or no labeled data. In this paper
we introduce the basics of this promising paradigm, describe a unified set of
mathematical notations that can cover a wide variety of existing work, and
organize existing work along several dimensions, e.g.the choice of pre-trained
models, prompts, and tuning strategies. To make the field more accessible to
interested beginners, we not only make a systematic review of existing works
and a highly structured typology of prompt-based concepts, but also release
other resources, e.g., a website http://pretrain.nlpedia.ai/ including
constantly-updated survey, and paperlist.
Related papers
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation
Models [43.35892536887404]
Prompt engineering involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks.
This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models.
arXiv Detail & Related papers (2023-07-24T17:58:06Z) - FILM: How can Few-Shot Image Classification Benefit from Pre-Trained
Language Models? [14.582209994281374]
Few-shot learning aims to train models that can be generalized to novel classes with only a few samples.
We propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning.
arXiv Detail & Related papers (2023-07-09T08:07:43Z) - Prompt Learning for News Recommendation [2.6524289609910654]
Some recent textitnews recommendation (NR) methods encode news representation by following the vanilla pre-train and fine-tune paradigm with carefully-designed recommendation-specific neural networks and objective functions.
We argue that their modeling paradigm has not well exploited the abundant semantic information and linguistic knowledge embedded in the pre-training process.
We develop a textitPrompt Learning for News Recommendation (Prompt4NR) framework, which transforms the task of predicting whether a user would click a candidate news as a cloze-style mask-prediction task.
arXiv Detail & Related papers (2023-04-11T14:56:06Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models [18.49325959450621]
We introduce TextPruner, an open-source model pruning toolkit for pre-trained language models.
TextPruner offers structured post-training pruning methods, including vocabulary pruning and transformer pruning.
Our experiments with several NLP tasks demonstrate the ability of TextPruner to reduce the model size without re-training the model.
arXiv Detail & Related papers (2022-03-30T02:10:33Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.