A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models
- URL: http://arxiv.org/abs/2311.00445v2
- Date: Thu, 11 Apr 2024 16:49:57 GMT
- Title: A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models
- Authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen,
- Abstract summary: We show that, within the PaLM2 family of language models, larger models are more logical than smaller ones.
Even the largest models make systematic errors, some of which mirror human reasoning biases.
Overall, we find that language models often mimic the human biases included in their training data, but are able to overcome them in some cases.
- Score: 39.77445889769015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate such human biases, or are they able to overcome them? Focusing on the case of syllogisms -- inferences from two simple premises -- we show that, within the PaLM2 family of transformer language models, larger models are more logical than smaller ones, and also more logical than humans. At the same time, even the largest models make systematic errors, some of which mirror human reasoning biases: they show sensitivity to the (irrelevant) ordering of the variables in the syllogism, and draw confident but incorrect inferences from particular syllogisms (syllogistic fallacies). Overall, we find that language models often mimic the human biases included in their training data, but are able to overcome them in some cases.
Related papers
- A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles [0.06554326244334868]
We evaluate large language models' sensitivity to argument roles by replicating psycholinguistic studies on human argument role processing.
We find that language models are able to distinguish verbs that appear in plausible and implausible contexts, where plausibility is determined through the relation between the verb and its preceding arguments.
This indicates that language models' capacity to detect verb plausibility does not arise from the same mechanism that underlies human real-time sentence processing.
arXiv Detail & Related papers (2024-10-21T16:05:58Z) - Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset [5.695579108997392]
This paper explores the question of how accurately current large language models can perform logical reasoning in natural language.
We present a syllogism dataset called NeuBAROCO, which consists of syllogistic reasoning problems in English and Japanese.
Our experiments with leading large language models indicate that these models exhibit reasoning biases similar to humans, along with other error tendencies.
arXiv Detail & Related papers (2024-08-08T12:10:50Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Empower Nested Boolean Logic via Self-Supervised Curriculum Learning [67.46052028752327]
We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested logic.
To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method textitCurriculum Logical Reasoning (textscClr)
arXiv Detail & Related papers (2023-10-09T06:54:02Z) - The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling
Probabilistic Social Inferences from Linguistic Inputs [50.32802502923367]
We study the process of language driving and influencing social reasoning in a probabilistic goal inference domain.
We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios.
Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
arXiv Detail & Related papers (2023-06-25T19:38:01Z) - Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning
Ability and Human-like Biases [8.583432139919616]
This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans.
We focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction.
We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects.
arXiv Detail & Related papers (2023-06-21T21:04:11Z) - A fine-grained comparison of pragmatic language understanding in humans
and language models [2.231167375820083]
We compare language models and humans on seven pragmatic phenomena.
We find that the largest models achieve high accuracy and match human error patterns.
Preliminary evidence that models and humans are sensitive to similar linguistic cues.
arXiv Detail & Related papers (2022-12-13T18:34:59Z) - Language models show human-like content effects on reasoning tasks [33.677483386820555]
Large language models (LMs) achieve above-chance on reasoning tasks, but exhibit many imperfections.
Human reasoning is affected by real-world knowledge and shows effects reliably when semantic content supports logical inferences.
Our findings have implications for both these cognitive effects in humans, and the factors that contribute to language model performance.
arXiv Detail & Related papers (2022-07-14T16:51:09Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.