Related papers: Large Language Model for Science: A Study on P vs. NP

Large Language Model for Science: A Study on P vs. NP

URL: http://arxiv.org/abs/2309.05689v1
Date: Mon, 11 Sep 2023 17:49:27 GMT
Title: Large Language Model for Science: A Study on P vs. NP
Authors: Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei
Abstract summary: We use large language models (LLMs) to augment and accelerate research on the P versus NP problem. Specifically, we propose Socratic reasoning, a general framework that promotes in-depth thinking with LLMs for complex problem-solving. Our pilot study on the P vs. NP problem shows that GPT-4 successfully produces a proof schema and engages in rigorous reasoning throughout 97 dialogue turns.
Score: 88.67249044141529
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics. Specifically, we propose Socratic reasoning, a general framework that promotes in-depth thinking with LLMs for complex problem-solving. Socratic reasoning encourages LLMs to recursively discover, solve, and integrate problems while facilitating self-evaluation and refinement. Our pilot study on the P vs. NP problem shows that GPT-4 successfully produces a proof schema and engages in rigorous reasoning throughout 97 dialogue turns, concluding "P $\neq$ NP", which is in alignment with (Xu and Zhou, 2023). The investigation uncovers novel insights within the extensive solution space of LLMs, shedding light on LLM for Science.

Related papers

Solving Situation Puzzles with Large Language Model and External Reformulation [6.793639595476304]
We show that large language models (LLMs) cannot perform well on reasoning that requires multiple rounds of dialogue. We propose a novel external reformulation methodology, where the situation puzzle will be reformulated after several rounds of Q&A. Experiments show superior performance (e.g., win rate, number of question/guess attempts) of our method than directly using LLMs for solving situation puzzles.
arXiv Detail & Related papers (2025-03-24T07:05:55Z)
Why Does Your CoT Prompt (Not) Work? Theoretical Analysis of Prompt Space Complexity, its Interaction with Answer Space During CoT Reasoning with LLMs: A Recurrent Perspective [15.941209553757274]
Chain-of-Thought (CoT) prompting has emerged as a practical solution to the limitations of Large Language Models (LLMs) This paper provides a rigorous theoretical analysis of the complexity and interplay between two crucial spaces: the prompt space and the answer space. We show that sometimes human supervision is critical for efficiently navigating the prompt space.
arXiv Detail & Related papers (2025-03-13T06:11:10Z)
Empowering LLMs with Logical Reasoning: A Comprehensive Survey [49.91445266392609]
Large language models (LLMs) have achieved remarkable successes on various natural language tasks. Recent studies have found that there are still significant challenges to the logical reasoning abilities of LLMs. This paper summarizes and categorizes the main challenges into two aspects.
arXiv Detail & Related papers (2025-02-21T18:20:35Z)
EHOP: A Dataset of Everyday NP-Hard Optimization Problems [66.41749917354159]
Everyday Hard Optimization Problems (EHOP) is a collection of NP-hard optimization problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks, versions that are dressed up as problems that could arise in real life, and variants of well-known problems with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts.
arXiv Detail & Related papers (2025-02-19T14:39:59Z)
Large Language Models and Mathematical Reasoning Failures [1.6114012813668932]
This paper investigates the mathematical reasoning capabilities of large language models (LLMs) using 50 newly constructed high-school-level word problems.<n>We rigorously analyze both final answers and solution steps to identify reasoning failures.<n>We find that while newer models (e.g., o3-mini, deepseek-r1) achieve higher accuracy, all models exhibit errors in spatial reasoning, strategic planning, and arithmetic.
arXiv Detail & Related papers (2025-02-17T09:07:32Z)
Can Large Language Models Reason? A Characterization via 3-SAT [11.422434149376478]
Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. Recent works have shown that LLMs often bypass true reasoning using shortcuts, sparking skepticism. We propose an experimental protocol centered on 3-SAT -- the NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks.
arXiv Detail & Related papers (2024-08-13T21:54:10Z)
Reasoning with Large Language Models, a Survey [2.831296564800826]
This paper reviews the rapidly expanding field of prompt-based reasoning with LLMs. Our taxonomy identifies different ways to generate, evaluate, and control multi-step reasoning. We find that self-improvement, self-reflection, and some meta abilities of the reasoning processes are possible through the judicious use of prompts.
arXiv Detail & Related papers (2024-07-16T08:49:35Z)
Reinforcement Learning Problem Solving with Large Language Models [0.0]
Large Language Models (LLMs) have an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of Natural Language Processing (NLP) tasks. This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems. We show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake"
arXiv Detail & Related papers (2024-04-29T12:16:08Z)
Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs [2.3020018305241337]
Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of large language models. We propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder.
arXiv Detail & Related papers (2024-04-11T22:19:50Z)
Logic Query of Thoughts: Guiding Large Language Models to Answer Complex Logic Queries with Knowledge Graphs [102.37496443389203]
'Logic-Query-of-Thoughts' (LGOT) is the first of its kind to combine knowledge graph reasoning and large language models. Our experimental findings demonstrate substantial performance enhancements, with up to 20% improvement over ChatGPT.
arXiv Detail & Related papers (2024-03-17T17:01:45Z)
Self-Discover: Large Language Models Self-Compose Reasoning Structures [136.48389510481758]
We introduce SELF-DISCOVER, a framework for self-discovering task-intrinsic reasoning structures. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks. We show that the self-discovered reasoning structures are universally applicable across model families.
arXiv Detail & Related papers (2024-02-06T01:13:53Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models. TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z)
NLPBench: Evaluating Large Language Models on Solving NLP Problems [41.01588131136101]
Large language models (LLMs) have shown promise in enhancing the capabilities of natural language processing (NLP) We present a unique benchmarking dataset, NLPBench, comprising 378 college-level NLP questions spanning various NLP topics sourced from Yale University's prior final exams. Our evaluation, centered on LLMs such as GPT-3.5/4, PaLM-2, and LLAMA-2, incorporates advanced prompting strategies like the chain-of-thought (CoT) and tree-of-thought (ToT)
arXiv Detail & Related papers (2023-09-27T13:02:06Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.