Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for
Instruction Generation Models
- URL: http://arxiv.org/abs/2301.05149v2
- Date: Sun, 28 May 2023 14:34:54 GMT
- Title: Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for
Instruction Generation Models
- Authors: Lingjun Zhao and Khanh Nguyen and Hal Daum\'e III
- Abstract summary: Recent work studies the cognitive capabilities of language models through psychological tests designed for humans.
We formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks.
- Score: 5.975913042883176
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent work studies the cognitive capabilities of language models through
psychological tests designed for humans. While these studies are helpful for
understanding the general capabilities of these models, there is no guarantee
that a model possessing sufficient capabilities to pass those tests would
actually use those capabilities in performing real-life tasks. In this work, we
formulate task-oriented cognitive capabilities, which are human-like cognitive
capabilities that language models leverage to perform tasks. These capabilities
are (i) the ability to quickly generate good candidate utterances (the search
capability) (ii) the ability to predict how a listener interprets those
utterances and choose the most appropriate one (the pragmatic capability). We
design an evaluation scheme for comparing these capabilities of a language
model with those of a human. Applying this scheme to examine various models in
a navigation instruction generation problem, we find that their pragmatic
capability is severely lacking. This insight leads us to augment them with
better models of the listener and obtain a significant boost of 11% in success
rate in guiding real humans. Our work advocates for having a principled
procedure for aligning language models with humans that involves (i)
formulating task-oriented capabilities, (ii) devising a method to quantify
their deficiency, and (iii) iteratively improving them.
Related papers
- Can Language Models Learn to Skip Steps? [59.84848399905409]
We study the ability to skip steps in reasoning.
Unlike humans, who may skip steps to enhance efficiency or to reduce cognitive load, models do not possess such motivations.
Our work presents the first exploration into human-like step-skipping ability.
arXiv Detail & Related papers (2024-11-04T07:10:24Z) - Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning [0.0]
Large Language Models (LLMs) have demonstrated their capabilities across various tasks.
This paper exploits the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks.
We compare the performance of LLMs with a cognitive instance-based learning model, which imitates human experiential decision-making.
arXiv Detail & Related papers (2024-07-12T14:13:06Z) - Auxiliary task demands mask the capabilities of smaller language models [2.938889003635811]
We show that evaluation methods with greater task demands yield lower performance than evaluations with reduced demands.
Our results illustrate that LM performance should not be interpreted as a direct indication of intelligence.
arXiv Detail & Related papers (2024-04-03T02:56:52Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
Cake? [62.59699229202307]
Despite advances in AI, it remains a significant challenge to develop interactive task guidance systems.
We created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor.
We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance.
arXiv Detail & Related papers (2023-11-01T15:13:49Z) - Are Emergent Abilities in Large Language Models just In-Context Learning? [46.561464069450444]
We present a novel theory that explains emergent abilities, taking into account their potential confounding factors.
Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge.
arXiv Detail & Related papers (2023-09-04T20:54:11Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.