Unnatural Instructions: Tuning Language Models with (Almost) No Human
Labor
- URL: http://arxiv.org/abs/2212.09689v1
- Date: Mon, 19 Dec 2022 18:21:00 GMT
- Title: Unnatural Instructions: Tuning Language Models with (Almost) No Human
Labor
- Authors: Or Honovich, Thomas Scialom, Omer Levy, Timo Schick
- Abstract summary: We introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor.
We collect 64,000 examples by prompting a language model with three seed examples of instructions and eliciting a fourth.
This set is then expanded by prompting the model to rephrase each instruction, creating a total of approximately 240,000 examples of instructions, inputs, and outputs.
- Score: 48.116843121810135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction tuning enables pretrained language models to perform new tasks
from inference-time natural language descriptions. These approaches rely on
vast amounts of human supervision in the form of crowdsourced datasets or user
interactions. In this work, we introduce Unnatural Instructions: a large
dataset of creative and diverse instructions, collected with virtually no human
labor. We collect 64,000 examples by prompting a language model with three seed
examples of instructions and eliciting a fourth. This set is then expanded by
prompting the model to rephrase each instruction, creating a total of
approximately 240,000 examples of instructions, inputs, and outputs.
Experiments show that despite containing a fair amount of noise, training on
Unnatural Instructions rivals the effectiveness of training on open-source
manually-curated datasets, surpassing the performance of models such as T0++
and Tk-Instruct across various benchmarks. These results demonstrate the
potential of model-generated data as a cost-effective alternative to
crowdsourcing for dataset expansion and diversification.
Related papers
- Forcing Diffuse Distributions out of Language Models [70.28345569190388]
Despite being trained specifically to follow user instructions, today's instructiontuned language models perform poorly when instructed to produce random outputs.
We propose a fine-tuning method that encourages language models to output distributions that are diffuse over valid outcomes.
arXiv Detail & Related papers (2024-04-16T19:17:23Z) - MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
Tuning [24.741736629886564]
Instruction tuning is a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions.
We introduce MUL-TIINSTRUCT, the first multimodal instruction tuning benchmark dataset.
We show strong zero-shot performance on various unseen multimodal tasks and the benefit of transfer learning from a text-only instruction dataset.
arXiv Detail & Related papers (2022-12-21T05:17:06Z) - Self-Instruct: Aligning Language Models with Self-Generated Instructions [76.42871502364697]
Self-Instruct is a framework for improving the instruction-following capabilities of pretrained language models.
Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.
For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin.
arXiv Detail & Related papers (2022-12-20T18:59:19Z) - Instruction Induction: From Few Examples to Natural Language Task
Descriptions [55.139554327372934]
We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples.
InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance.
arXiv Detail & Related papers (2022-05-22T09:22:37Z) - How Many Data Samples is an Additional Instruction Worth? [20.66688303609522]
Recently introduced instruction-paradigm empowers non-expert users to leverage NLP resources by defining a new task in natural language.
Our results indicate that an additional instruction can be equivalent to 200 data samples on average across tasks.
arXiv Detail & Related papers (2022-03-17T08:30:30Z) - WANLI: Worker and AI Collaboration for Natural Language Inference
Dataset Creation [101.00109827301235]
We introduce a novel paradigm for dataset creation based on human and machine collaboration.
We use dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instruct GPT-3 to compose new examples with similar patterns.
The resulting dataset, WANLI, consists of 108,357 natural language inference (NLI) examples that present unique empirical strengths.
arXiv Detail & Related papers (2022-01-16T03:13:49Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - The Turking Test: Can Language Models Understand Instructions? [45.266428794559495]
We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity.
Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks.
arXiv Detail & Related papers (2020-10-22T18:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.