Natural Instructions: Benchmarking Generalization to New Tasks from
Natural Language Instructions
- URL: http://arxiv.org/abs/2104.08773v1
- Date: Sun, 18 Apr 2021 08:44:56 GMT
- Title: Natural Instructions: Benchmarking Generalization to New Tasks from
Natural Language Instructions
- Authors: Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi
- Abstract summary: We evaluate existing state-of-the-art language-models (LMs) in addressing new tasks by few-shot prompting of GPT3 and fine-tuning BART.
Our analysis indicates that: (a) the existing models indeed benefit from instructions and hence, show improved generalization to new tasks; (b) while models like GPT-3 generally benefit from instructions, the extent of their gains varies across different fields of instructions and also depends on the task being solved.
- Score: 48.32337380549338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can we enable NLP models to appropriately respond to instructional prompts
and consequently generalize to new tasks? To study this question, we leverage
the existing NLP datasets and the instructions that were used to crowdsource
them to create NATURAL INSTRUCTIONS, a dataset of instructions and
task-specific input/output data. This dataset consists of 61 distinct language
instructions and about 600k task instances, and is used to evaluate existing
state-of-the-art language-models (LMs) in addressing new tasks by few-shot
prompting of GPT3 and fine-tuning BART. Our analysis indicates that: (a) the
existing models indeed benefit from instructions and hence, show improved
generalization to new tasks; (b) while models like GPT-3 generally benefit from
instructions, the extent of their gains varies across different fields of
instructions and also depends on the task being solved; (c) generalization to
unseen tasks in NATURAL INSTRUCTIONS remains far from perfect for the
state-of-the-art, indicating significant room for more progress in this
direction.
Related papers
- UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions [64.50935101415776]
We build a single model that jointly performs various spoken language understanding (SLU) tasks.
We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages.
arXiv Detail & Related papers (2023-10-04T17:10:23Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Unnatural Instructions: Tuning Language Models with (Almost) No Human
Labor [48.116843121810135]
We introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor.
We collect 64,000 examples by prompting a language model with three seed examples of instructions and eliciting a fourth.
This set is then expanded by prompting the model to rephrase each instruction, creating a total of approximately 240,000 examples of instructions, inputs, and outputs.
arXiv Detail & Related papers (2022-12-19T18:21:00Z) - Boosting Natural Language Generation from Instructions with
Meta-Learning [43.64522457686405]
Recent work has shown that language models (LMs) trained with multi-task.
textitinstructional learning (MTIL) can solve diverse NLP.
tasks with improved performance compared to prompt tuning.
In this paper we investigate whether meta-learning applied to MTIL can further improve generalization to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2022-10-20T22:23:23Z) - Instruction Induction: From Few Examples to Natural Language Task
Descriptions [55.139554327372934]
We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples.
InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance.
arXiv Detail & Related papers (2022-05-22T09:22:37Z) - How Many Data Samples is an Additional Instruction Worth? [20.66688303609522]
Recently introduced instruction-paradigm empowers non-expert users to leverage NLP resources by defining a new task in natural language.
Our results indicate that an additional instruction can be equivalent to 200 data samples on average across tasks.
arXiv Detail & Related papers (2022-03-17T08:30:30Z) - InstructionNER: A Multi-Task Instruction-Based Generative Framework for
Few-shot NER [31.32381919473188]
We propose a multi-task instruction-based generative framework, named InstructionNER, for low-resource named entity recognition.
Specifically, we reformulate the NER task as a generation problem, which enriches source sentences with task-specific instructions and answer options, then inferences the entities and types in natural language.
Experimental results show that our method consistently outperforms other baselines on five datasets in few-shot settings.
arXiv Detail & Related papers (2022-03-08T07:56:36Z) - Quantifying Adaptability in Pre-trained Language Models with 500 Tasks [60.0364822929442]
We present a large-scale empirical study of the features and limits of LM adaptability using a new benchmark, TaskBench500.
We evaluate three facets of adaptability, finding that adaptation procedures differ dramatically in their ability to memorize small datasets.
Our experiments show that adaptability to new tasks, like generalization to new examples, can be systematically described and understood.
arXiv Detail & Related papers (2021-12-06T18:00:25Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.