Boosting Logical Reasoning in Large Language Models through a New
Framework: The Graph of Thought
- URL: http://arxiv.org/abs/2308.08614v1
- Date: Wed, 16 Aug 2023 18:13:27 GMT
- Title: Boosting Logical Reasoning in Large Language Models through a New
Framework: The Graph of Thought
- Authors: Bin Lei, pei-Hung Lin, Chunhua Liao, Caiwen Ding
- Abstract summary: Our paper unveils a pioneering prompting technique, dubbed textitGraph of Thoughts (GoT).
Our method outperformed GPT-4, achieving accuracy improvements of $89.7%$, $86%$, and $56%$ for each respective task.
When juxtaposed with the state-of-the-art prompting method, textitTree of Thought (ToT), our approach registered an average accuracy boost of $23%$, $24%$, and $15%$.
- Score: 7.356034193515096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in large-scale models, such as GPT-4, have showcased
remarkable capabilities in addressing standard queries. However, when facing
complex problems that require multi-step logical reasoning, their accuracy
dramatically decreases. Current research has explored the realm of
\textit{prompting engineering} to bolster the inferential capacities of these
models. Our paper unveils a pioneering prompting technique, dubbed
\textit{Graph of Thoughts (GoT)}. Through testing on a trio of escalating
challenges: the 24-point game, resolution of high-degree polynomial equations,
and derivation of formulas for recursive sequences, our method outperformed
GPT-4, achieving accuracy improvements of $89.7\%$, $86\%$, and $56\%$ for each
respective task. Moreover, when juxtaposed with the state-of-the-art (SOTA)
prompting method, \textit{Tree of Thought (ToT)}, our approach registered an
average accuracy boost of $23\%$, $24\%$, and $15\%$.
Related papers
- Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models [3.207886496235499]
We study the process through which reasoning models trained with reinforcement learning on verifiable rewards (RLVR) can learn to solve new problems.<n>We find that RLVR drives performance in two main ways: (1) by compressing pass@$k$ into pass@1 and (2) via "capability gain" in which models learn to solve new problems that they previously could not solve even at high $k$.
arXiv Detail & Related papers (2025-06-16T19:03:06Z) - First Finish Search: Efficient Test-Time Scaling in Large Language Models [20.62274005080048]
First Finish Search (FFS) is a training-free parallel decoding strategy that launches $n$ independent samples and returns as soon as any one completes.<n>FFS achieves $82.23%$ accuracy on the AIME datasets, a $15%$ improvement over DeepSeek-R1's standalone accuracy, nearly matching OpenAI's o4-mini performance.
arXiv Detail & Related papers (2025-05-23T17:57:43Z) - From Continual Learning to SGD and Back: Better Rates for Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations.<n>We develop novel last-iterate upper bounds in the realizable least squares setup.<n>We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic in sufficiently long task sequences.
arXiv Detail & Related papers (2025-04-06T18:39:45Z) - Boosting Multimodal Reasoning with Automated Structured Thinking [24.845193791363346]
AStar is a lightweight library of high-level reasoning patterns abstracted from 500 prior samples using Monte Carlo Tree Search.<n>For each test problem, AStar adaptively retrieves the optimal thought cards and seamlessly integrates these external explicit guidelines with the model's internal implicit reasoning capabilities.
arXiv Detail & Related papers (2025-02-04T14:18:29Z) - On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis [22.641550077885686]
We analyze the computational limits and efficiency criteria of Visual Autoregressive ($mathsf/$) Models.
We prove that assuming the Strong Exponential Time Hypothesis ($mathsfSETH$) from fine-grained complexity theory, a sub-quartic time algorithm for $mathsf/$ models is impossible.
Our technique will shed light on advancing scalable and efficient image generation in $mathsf/$ frameworks.
arXiv Detail & Related papers (2025-01-08T09:34:15Z) - Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning [29.39584492735953]
We identify representation collapse in the model's intermediate layers as a key factor limiting their reasoning capabilities.
We propose Sequential Variance-Covariance Regularization (Seq-VCR), which enhances the entropy of intermediate representations and prevents collapse.
arXiv Detail & Related papers (2024-11-04T18:14:07Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models [22.425339110551743]
We introduce $textitweak-to-strong search, framing the alignment of a large language model as a test-time greedy search.
In controlled-sentiment generation and summarization, we use tuned and untuned $textttgpt2$s to improve the alignment of large models without additional training.
In a more difficult instruction-following benchmark, we show that reusing off-the-shelf small models can improve the length-controlled win rates of both white-box and black-box large models.
arXiv Detail & Related papers (2024-05-29T16:55:32Z) - DGoT: Dynamic Graph of Thoughts for Scientific Abstract Generation [4.404836880890741]
We propose a Dynamic Graph of Thought (DGoT) to solve the task of generating scientific paper abstracts.
Our method's cost-effectiveness in abstract generation tasks is only 43.7% to 56.4% of other multi-round query prompt approaches.
arXiv Detail & Related papers (2024-03-26T08:47:23Z) - Tree of Thoughts: Deliberate Problem Solving with Large Language Models [52.31950122881687]
We introduce a new framework for language model inference, Tree of Thoughts (ToT)
ToT generalizes over the popular Chain of Thought approach to prompting language models.
Our experiments show that ToT significantly enhances language models' problem-solving abilities.
arXiv Detail & Related papers (2023-05-17T23:16:17Z) - Progressive-Hint Prompting Improves Reasoning in Large Language Models [63.98629132836499]
This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP)
It enables automatic multiple interactions between users and Large Language Models (LLMs) by using previously generated answers as hints to progressively guide toward the correct answers.
We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient.
arXiv Detail & Related papers (2023-04-19T16:29:48Z) - Reframing Instructional Prompts to GPTk's Language [72.69833640335519]
We propose reframing techniques for model designers to create effective prompts for language models.
Our results show that reframing improves few-shot learning performance by 14% while reducing sample complexity.
The performance gains are particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is not feasible.
arXiv Detail & Related papers (2021-09-16T09:44:43Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.