Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset
- URL: http://arxiv.org/abs/2408.04403v1
- Date: Thu, 8 Aug 2024 12:10:50 GMT
- Title: Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset
- Authors: Kentaro Ozeki, Risako Ando, Takanobu Morishita, Hirohiko Abe, Koji Mineshima, Mitsuhiro Okada,
- Abstract summary: This paper explores the question of how accurately current large language models can perform logical reasoning in natural language.
We present a syllogism dataset called NeuBAROCO, which consists of syllogistic reasoning problems in English and Japanese.
Our experiments with leading large language models indicate that these models exhibit reasoning biases similar to humans, along with other error tendencies.
- Score: 5.695579108997392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores the question of how accurately current large language models can perform logical reasoning in natural language, with an emphasis on whether these models exhibit reasoning biases similar to humans. Specifically, our study focuses on syllogistic reasoning, a form of deductive reasoning extensively studied in cognitive science as a natural form of human reasoning. We present a syllogism dataset called NeuBAROCO, which consists of syllogistic reasoning problems in English and Japanese. This dataset was originally designed for psychological experiments to assess human reasoning capabilities using various forms of syllogisms. Our experiments with leading large language models indicate that these models exhibit reasoning biases similar to humans, along with other error tendencies. Notably, there is significant room for improvement in reasoning problems where the relationship between premises and hypotheses is neither entailment nor contradiction. We also present experimental results and in-depth analysis using a new Chain-of-Thought prompting method, which asks LLMs to translate syllogisms into abstract logical expressions and then explain their reasoning process. Our analysis using this method suggests that the primary limitations of LLMs lie in the reasoning process itself rather than the interpretation of syllogisms.
Related papers
- A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences [5.141416267381492]
We consider the case of syllogistic reasoning, an area of deductive reasoning studied extensively in logic and cognitive psychology.
We investigate the effects of chain-of-thought reasoning, in-context learning, and supervised fine-tuning on syllogistic reasoning.
Our results suggest that the behavior of pre-trained LLMs can be explained by cognitive science.
arXiv Detail & Related papers (2024-06-17T08:59:04Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models [39.77445889769015]
We show that, within the PaLM2 family of language models, larger models are more logical than smaller ones.
Even the largest models make systematic errors, some of which mirror human reasoning biases.
Overall, we find that language models often mimic the human biases included in their training data, but are able to overcome them in some cases.
arXiv Detail & Related papers (2023-11-01T11:13:06Z) - Towards a Mechanistic Interpretation of Multi-Step Reasoning
Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities.
It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism.
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z) - Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning
Ability and Human-like Biases [8.583432139919616]
This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans.
We focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction.
We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects.
arXiv Detail & Related papers (2023-06-21T21:04:11Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z) - Natural Language Reasoning, A Survey [16.80326702160048]
Conceptually, we provide a distinct definition for natural language reasoning in NLP.
We conduct a comprehensive literature review on natural language reasoning in NLP.
The paper also identifies and views backward reasoning, a powerful paradigm for multi-step reasoning.
arXiv Detail & Related papers (2023-03-26T13:44:18Z) - Logical Reasoning over Natural Language as Knowledge Representation: A
Survey [43.29703101875716]
This paper provides an overview on a new paradigm of logical reasoning, which uses natural language as knowledge representation and pretrained language models as reasoners.
This new paradigm is promising since it not only alleviates many challenges of formal representation but also has advantages over end-to-end neural methods.
arXiv Detail & Related papers (2023-03-21T16:56:05Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.