Can Large Language Models Reason? A Characterization via 3-SAT
- URL: http://arxiv.org/abs/2408.07215v2
- Date: Tue, 22 Oct 2024 21:44:03 GMT
- Title: Can Large Language Models Reason? A Characterization via 3-SAT
- Authors: Rishi Hazra, Gabriele Venturato, Pedro Zuidberg Dos Martires, Luc De Raedt,
- Abstract summary: Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities.
Recent works have shown that LLMs often bypass true reasoning using shortcuts, sparking skepticism.
We propose an experimental protocol centered on 3-SAT -- the NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks.
- Score: 11.422434149376478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. However, recent works have shown that LLMs often bypass true reasoning using shortcuts, sparking skepticism. To study the reasoning capabilities in a principled fashion, we adopt a computational theory perspective and propose an experimental protocol centered on 3-SAT -- the prototypical NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks. Specifically, we examine the phase transitions in random 3-SAT and characterize the reasoning abilities of LLMs by varying the inherent hardness of the problem instances. Our experimental evidence shows that LLMs are incapable of performing true reasoning, as required for solving 3-SAT problems. Moreover, we observe significant performance variation based on the inherent hardness of the problems -- performing poorly on harder instances and vice versa. Importantly, we show that integrating external reasoners can considerably enhance LLM performance. By following a principled experimental protocol, our study draws concrete conclusions and moves beyond the anecdotal evidence often found in LLM reasoning research.
Related papers
- Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment [54.62926010621013]
We introduce a novel task, code reasoning, to provide a new perspective for the reasoning abilities of large language models.
We summarize three meta-benchmarks based on established forms of logical reasoning, and instantiate these into eight specific benchmark tasks.
We present a new pathway exploration pipeline inspired by human intricate problem-solving methods.
arXiv Detail & Related papers (2025-02-17T10:39:58Z) - CounterBench: A Benchmark for Counterfactuals Reasoning in Large Language Models [5.409370027524351]
We evaluate the performance of large language models (LLMs) in counterfactual reasoning.
We introduce a new benchmark dataset, CounterBench, comprising 1K counterfactual reasoning questions.
arXiv Detail & Related papers (2025-02-16T06:19:37Z) - Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying [0.3659498819753633]
State-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning.
This paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation.
We show that employing these critical questions can improve the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-12-19T18:51:30Z) - What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance.
We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z) - Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study
on Syllogism [19.590120229602103]
Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting.
In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation.
arXiv Detail & Related papers (2023-10-23T12:40:41Z) - Concise and Organized Perception Facilitates Reasoning in Large Language Models [32.71672086718057]
We show that large language models (LLMs) exhibit failure patterns akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks.
We propose a novel reasoning approach named Concise and Organized Perception (COP)
COP carefully analyzes the given statements to identify the most pertinent information while eliminating redundancy efficiently.
arXiv Detail & Related papers (2023-10-05T04:47:49Z) - Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems [17.80128896525717]
backward reasoning is relatively unexplored.
backward reasoning can be seen as the ''inverse'' of forward reasoning.
We propose variations of three different forward reasoning strategies to improve performance.
arXiv Detail & Related papers (2023-10-03T12:03:06Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training.
We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion.
The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.