Bidirectional Language Models Are Also Few-shot Learners
- URL: http://arxiv.org/abs/2209.14500v1
- Date: Thu, 29 Sep 2022 01:35:57 GMT
- Title: Bidirectional Language Models Are Also Few-shot Learners
- Authors: Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin
Raffel, Chris Callison-Burch
- Abstract summary: We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models.
We show SAP is effective on question answering and summarization.
For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models.
- Score: 54.37445173284831
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models such as GPT-3 (Brown et al., 2020) can perform
arbitrary tasks without undergoing fine-tuning after being prompted with only a
few labeled examples. An arbitrary task can be reformulated as a natural
language prompt, and a language model can be asked to generate the completion,
indirectly performing the task in a paradigm known as prompt-based learning. To
date, emergent prompt-based learning capabilities have mainly been demonstrated
for unidirectional language models. However, bidirectional language models
pre-trained on denoising objectives such as masked language modeling produce
stronger learned representations for transfer learning. This motivates the
possibility of prompting bidirectional models, but their pre-training
objectives have made them largely incompatible with the existing prompting
paradigm. We present SAP (Sequential Autoregressive Prompting), a technique
that enables the prompting of bidirectional models. Utilizing the machine
translation task as a case study, we prompt the bidirectional mT5 model (Xue et
al., 2021) with SAP and demonstrate its few-shot and zero-shot translations
outperform the few-shot translations of unidirectional models like GPT-3 and
XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We
further show SAP is effective on question answering and summarization. For the
first time, our results demonstrate prompt-based learning is an emergent
property of a broader class of language models, rather than only unidirectional
models.
Related papers
- Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages.
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism.
The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
Learners [23.150999852147283]
This study proposes a novel pluggable, and efficient approach named DifferentiAble pRompT (DART)
It can convert small language models into better few-shot learners without any prompt engineering.
A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
arXiv Detail & Related papers (2021-08-30T12:29:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.