Language models show human-like content effects on reasoning tasks
- URL: http://arxiv.org/abs/2207.07051v4
- Date: Wed, 17 Jul 2024 22:01:29 GMT
- Title: Language models show human-like content effects on reasoning tasks
- Authors: Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Hannah R. Sheahan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill,
- Abstract summary: Large language models (LMs) achieve above-chance on reasoning tasks, but exhibit many imperfections.
Human reasoning is affected by real-world knowledge and shows effects reliably when semantic content supports logical inferences.
Our findings have implications for both these cognitive effects in humans, and the factors that contribute to language model performance.
- Score: 33.677483386820555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semantic content of a problem supports the correct logical inferences. These content-entangled reasoning patterns play a central role in debates about the fundamental nature of human intelligence. Here, we investigate whether language models $\unicode{x2014}$ whose prior expectations capture some aspects of human knowledge $\unicode{x2014}$ similarly mix content into their answers to logical problems. We explored this question across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. We evaluate state of the art large language models, as well as humans, and find that the language models reflect many of the same patterns observed in humans across these tasks $\unicode{x2014}$ like humans, models answer more accurately when the semantic content of a task supports the logical inferences. These parallels are reflected both in answer patterns, and in lower-level features like the relationship between model answer distributions and human response times. Our findings have implications for understanding both these cognitive effects in humans, and the factors that contribute to language model performance.
Related papers
- Perceptions of Linguistic Uncertainty by Language Models and Humans [26.69714008538173]
We investigate how language models map linguistic expressions of uncertainty to numerical responses.
We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner.
This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge.
arXiv Detail & Related papers (2024-07-22T17:26:12Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models [39.77445889769015]
We show that, within the PaLM2 family of language models, larger models are more logical than smaller ones.
Even the largest models make systematic errors, some of which mirror human reasoning biases.
Overall, we find that language models often mimic the human biases included in their training data, but are able to overcome them in some cases.
arXiv Detail & Related papers (2023-11-01T11:13:06Z) - Learning the meanings of function words from grounded language using a visual question answering model [28.10687343493772]
We show that recent neural-network based visual question answering models can learn to use function words as part of answering questions about complex visual scenes.
We find that these models can learn the meanings of logical connectives and and or without any prior knowledge of logical reasoning.
Our findings offer proof-of-concept evidence that it is possible to learn the nuanced interpretations of function words in visually grounded context.
arXiv Detail & Related papers (2023-08-16T18:53:39Z) - The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling
Probabilistic Social Inferences from Linguistic Inputs [50.32802502923367]
We study the process of language driving and influencing social reasoning in a probabilistic goal inference domain.
We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios.
Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
arXiv Detail & Related papers (2023-06-25T19:38:01Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z) - Testing AI on language comprehension tasks reveals insensitivity to underlying meaning [3.335047764053173]
Large Language Models (LLMs) are recruited in applications that span from clinical assistance and legal support to question answering and education.
Yet, reverse-engineering is bound by Moravec's Paradox, according to which easy skills are hard.
We systematically assess 7 state-of-the-art models on a novel benchmark.
arXiv Detail & Related papers (2023-02-23T20:18:52Z) - A fine-grained comparison of pragmatic language understanding in humans
and language models [2.231167375820083]
We compare language models and humans on seven pragmatic phenomena.
We find that the largest models achieve high accuracy and match human error patterns.
Preliminary evidence that models and humans are sensitive to similar linguistic cues.
arXiv Detail & Related papers (2022-12-13T18:34:59Z) - Towards Abstract Relational Learning in Human Robot Interaction [73.67226556788498]
Humans have a rich representation of the entities in their environment.
If robots need to interact successfully with humans, they need to represent entities, attributes, and generalizations in a similar way.
In this work, we address the problem of how to obtain these representations through human-robot interaction.
arXiv Detail & Related papers (2020-11-20T12:06:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.