Related papers: Conditional and Modal Reasoning in Large Language Models

Conditional and Modal Reasoning in Large Language Models

URL: http://arxiv.org/abs/2401.17169v2
Date: Thu, 4 Jul 2024 18:12:25 GMT
Title: Conditional and Modal Reasoning in Large Language Models
Authors: Wesley H. Holliday, Matthew Mandelkern, Cedegao E. Zhang,
Abstract summary: We probe the extent to which twenty-five large language models are able to distinguish logically correct inferences from fallacious ones. All but the GPT-4 model family often make basic mistakes with conditionals. Almost all models give answers to certain complex conditional inferences widely discussed in the literature that do not match human judgments.
Score: 1.999925939110439
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The reasoning abilities of large language models (LLMs) are the topic of a growing body of research in AI and cognitive science. In this paper, we probe the extent to which twenty-five LLMs are able to distinguish logically correct inferences from logically fallacious ones. We focus on inference patterns involving conditionals (e.g., 'If Ann has a queen, then Bob has a jack') and epistemic modals (e.g., 'Ann might have an ace', 'Bob must have a king'). These inferences have been of special interest to logicians, philosophers, and linguists, since they play a central role in the fundamental human ability to reason about distal possibilities. Assessing LLMs on these inferences is thus highly relevant to the question of how much the reasoning abilities of LLMs match those of humans. Among the LLMs we tested, all but the GPT-4 model family often make basic mistakes with conditionals, though zero-shot chain-of-thought prompting helps them make fewer mistakes. Moreover, even the GPT-4 family displays logically inconsistent judgments across inference patterns involving epistemic modals, and almost all models give answers to certain complex conditional inferences widely discussed in the literature that do not match human judgments. These results highlight gaps in basic logical reasoning in today's LLMs.

Related papers

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? [62.17959154852391]
Causal reasoning capability is critical in advancing large language models toward strong artificial intelligence.<n>We show that large language models (LLMs) are only capable of performing shallow (level-1) causal reasoning.<n>We propose G2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes.
arXiv Detail & Related papers (2025-06-26T13:11:01Z)
Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs [28.556628696390767]
Large Language Models (LLMs) demonstrate impressive reasoning capabilities.<n>Evidence suggests much of their success stems from memorized answer-reasoning patterns rather than genuine inference.<n>We propose a five-level answer-visibility prompt framework that systematically manipulates answer cues and probes model behavior through indirect, behavioral analysis.
arXiv Detail & Related papers (2025-06-21T08:15:45Z)
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition [11.422434149376478]
Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks. Recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features.
arXiv Detail & Related papers (2025-04-04T20:57:36Z)
Large Language and Reasoning Models are Shallow Disjunctive Reasoners [15.56445409535547]
Large Language Models (LLMs) have been found to struggle with systematic reasoning.<n>This paper focuses on tasks that require systematic relational composition for qualitative spatial and temporal reasoning.<n>We find that, zero-shot LRMs generally outperform their LLM counterparts in single-path reasoning tasks but struggle in the multi-path setting.
arXiv Detail & Related papers (2025-03-30T15:41:55Z)
Empowering LLMs with Logical Reasoning: A Comprehensive Survey [49.91445266392609]
Large language models (LLMs) have achieved remarkable successes on various natural language tasks. Recent studies have found that there are still significant challenges to the logical reasoning abilities of LLMs. This paper summarizes and categorizes the main challenges into two aspects.
arXiv Detail & Related papers (2025-02-21T18:20:35Z)
LLMs can implicitly learn from mistakes in-context [15.818061010632249]
We investigate whether Large Language Models (LLMs) can learn from mistakes in mathematical reasoning tasks when explanations are not provided. Surprisingly, we find that LLMs perform better, on average, when rationales are eliminated from the context. This approach also substantially outperforms chain-of-thought prompting in our evaluations.
arXiv Detail & Related papers (2025-02-12T16:31:21Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the causal reasoning abilities of large language models (LLMs) through the representative problem of inferring causal relationships from narratives. We find that even state-of-the-art language models rely on unreliable shortcuts, both in terms of the narrative presentation and their parametric knowledge.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs [99.76347807139615]
Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning. This raises an essential question: In LLM reasoning, which poses a greater challenge - deductive or inductive reasoning?
arXiv Detail & Related papers (2024-07-31T18:47:11Z)
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners [58.15511660018742]
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities. We develop carefully controlled synthetic datasets, featuring conjunction fallacy and syllogistic problems.
arXiv Detail & Related papers (2024-06-16T19:22:53Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
Do Large Language Models Understand Logic or Just Mimick Context? [14.081178100662163]
This paper investigates the reasoning capabilities of large language models (LLMs) on two logical reasoning datasets. It is found that LLMs do not truly understand logical rules; rather, in-context learning has simply enhanced the likelihood of these models arriving at the correct answers.
arXiv Detail & Related papers (2024-02-19T12:12:35Z)
CLadder: Assessing Causal Reasoning in Language Models [82.8719238178569]
We investigate whether large language models (LLMs) can coherently reason about causality. We propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al.
arXiv Detail & Related papers (2023-12-07T15:12:12Z)
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism [19.590120229602103]
Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation.
arXiv Detail & Related papers (2023-10-23T12:40:41Z)
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate [19.887103433032774]
Large language models (LLMs) have shown impressive performance in complex reasoning tasks. This work explores testing LLMs' reasoning by engaging with them in a debate-like conversation. We find that despite their impressive performance, LLMs like ChatGPT cannot maintain their beliefs in truth for a significant portion of examples.
arXiv Detail & Related papers (2023-05-22T15:47:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.