Instruction Induction: From Few Examples to Natural Language Task
Descriptions
- URL: http://arxiv.org/abs/2205.10782v1
- Date: Sun, 22 May 2022 09:22:37 GMT
- Title: Instruction Induction: From Few Examples to Natural Language Task
Descriptions
- Authors: Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy
- Abstract summary: We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples.
InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance.
- Score: 55.139554327372934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models are able to perform a task by conditioning on a few
input-output demonstrations - a paradigm known as in-context learning. We show
that language models can explicitly infer an underlying task from a few
demonstrations by prompting them to generate a natural language instruction
that fits the examples. To explore this ability, we introduce the instruction
induction challenge, compile a dataset consisting of 24 tasks, and define a
novel evaluation metric based on executing the generated instruction. We
discover that, to a large extent, the ability to generate instructions does
indeed emerge when using a model that is both large enough and aligned to
follow instructions; InstructGPT achieves 65.7% of human performance in our
execution-based metric, while the original GPT-3 model reaches only 9.8% of
human performance. This surprising result suggests that instruction induction
might be a viable learning paradigm in and of itself, where instead of fitting
a set of latent continuous parameters to the data, one searches for the best
description in the natural language hypothesis space.
Related papers
- Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Unnatural Instructions: Tuning Language Models with (Almost) No Human
Labor [48.116843121810135]
We introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor.
We collect 64,000 examples by prompting a language model with three seed examples of instructions and eliciting a fourth.
This set is then expanded by prompting the model to rephrase each instruction, creating a total of approximately 240,000 examples of instructions, inputs, and outputs.
arXiv Detail & Related papers (2022-12-19T18:21:00Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z) - Finetuned Language Models Are Zero-Shot Learners [67.70352207685558]
We show that instruction tuning boosts zero-shot performance on unseen tasks.
We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates.
We evaluate this instruction-tuned model, which we call FLAN, on unseen task types.
arXiv Detail & Related papers (2021-09-03T17:55:52Z) - The Turking Test: Can Language Models Understand Instructions? [45.266428794559495]
We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity.
Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks.
arXiv Detail & Related papers (2020-10-22T18:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.