Related papers: The Turking Test: Can Language Models Understand Instructions?

The Turking Test: Can Language Models Understand Instructions?

URL: http://arxiv.org/abs/2010.11982v1
Date: Thu, 22 Oct 2020 18:44:16 GMT
Title: The Turking Test: Can Language Models Understand Instructions?
Authors: Avia Efrat and Omer Levy
Abstract summary: We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks.
Score: 45.266428794559495
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supervised machine learning provides the learner with a set of input-output examples of the target task. Humans, however, can also learn to perform new tasks from instructions in natural language. Can machines learn to understand instructions as well? We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. These range from simple tasks, like retrieving the nth word of a sentence, to ones that require creativity, such as generating examples for SNLI and SQuAD in place of human intelligence workers ("turkers"). Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks. Analyzing the model's error patterns reveals that the model tends to ignore explicit instructions and often generates outputs that cannot be construed as an attempt to solve the task. While it is not yet clear whether instruction understanding can be captured by traditional language models, the sheer expressivity of instruction understanding makes it an appealing alternative to the rising few-shot inference paradigm.

Related papers

An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models [99.31449616860291]
Modern language models (LMs) can learn to perform new tasks in different ways. In instruction following, the target task is described explicitly in natural language; in few-shot prompting, the task is specified implicitly. In instruction inference, LMs are presented with in-context examples and are then prompted to generate a natural language task description.
arXiv Detail & Related papers (2024-04-03T19:31:56Z)
LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity [11.44413929033824]
We develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow. We conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias.
arXiv Detail & Related papers (2023-04-12T22:55:52Z)
Benchmarking Language Models for Code Syntax Understanding [79.11525961219591]
Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding. In this work, we perform the first thorough benchmarking of the state-of-the-art pre-trained models for identifying the syntactic structures of programs. Our findings point out key limitations of existing pre-training methods for programming languages, and suggest the importance of modeling code syntactic structures.
arXiv Detail & Related papers (2022-10-26T04:47:18Z)
Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. We apply multi-task learning to make the model learn to generalize to new tasks better. Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z)
Instruction Induction: From Few Examples to Natural Language Task Descriptions [55.139554327372934]
We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. InstructGPT achieves 65.7% of human performance in our execution-based metric, while the original GPT-3 model reaches only 9.8% of human performance.
arXiv Detail & Related papers (2022-05-22T09:22:37Z)
LISA: Learning Interpretable Skill Abstractions from Language [85.20587800593293]
We propose a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations. Our method demonstrates a more natural way to condition on language in sequential decision-making problems.
arXiv Detail & Related papers (2022-02-28T19:43:24Z)
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm [0.0]
We discuss methods of prompt programming, emphasizing the usefulness of considering prompts through the lens of natural language. We introduce the idea of a metaprompt that seeds the model to generate its own natural language prompts for a range of tasks.
arXiv Detail & Related papers (2021-02-15T05:27:55Z)
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning [32.82030512053361]
We propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories. We find that human demonstrations help solve the most complex tasks. We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2020-11-01T14:39:46Z)
Language Conditioned Imitation Learning over Unstructured Data [9.69886122332044]
We present a method for incorporating free-form natural language conditioning into imitation learning. Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network. We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data.
arXiv Detail & Related papers (2020-05-15T17:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.