On the Empirical Complexity of Reasoning and Planning in LLMs
- URL: http://arxiv.org/abs/2404.11041v2
- Date: Tue, 18 Jun 2024 02:03:35 GMT
- Title: On the Empirical Complexity of Reasoning and Planning in LLMs
- Authors: Liwei Kang, Zirui Zhao, David Hsu, Wee Sun Lee,
- Abstract summary: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs)
This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning.
- Score: 29.588100727466976
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimented with 6 reasoning tasks, ranging from grade school math, air travel planning, ..., to Blocksworld. The results suggest that (i) both CoT and ToT benefit significantly from task decomposition, which breaks a complex reasoning task into a sequence of steps with low sample complexity and explicitly outlines the reasoning structure, and (ii) for computationally hard reasoning tasks, the more sophisticated tree structure of ToT outperforms the linear structure of CoT. These findings provide useful guidelines for the use of LLM in solving reasoning tasks in practice.
Related papers
- Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory [15.24542569393982]
Current models often rely on shortcuts rather than genuine multi-step reasoning, leading to performance degradation as task complexity increases.
Our contributions provide a theoretical framework based on complexity theory to explain SSMs' limitations.
Our experiments reveal significant performance drops as task complexity increases, even with Chain-of-Thought prompting.
arXiv Detail & Related papers (2024-05-26T19:33:23Z) - Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a complex reasoning schema over KG upon large language models (LLMs)
We augment the arbitrary first-order logical queries via binary tree decomposition to stimulate the reasoning capability of LLMs.
Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods.
arXiv Detail & Related papers (2024-05-02T18:12:08Z) - Evaluating LLMs' Mathematical Reasoning in Financial Document Question
Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets.
We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps.
We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z) - Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level.
We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z) - Unlocking Temporal Question Answering for Large Language Models Using
Code Execution [38.945784849917004]
Large language models (LLMs) have made significant progress in natural language processing (NLP)
We propose a novel framework that combines the extraction capability of LLMs and the logical reasoning capability of a Python solver to tackle this issue.
arXiv Detail & Related papers (2023-05-24T10:57:53Z) - Complex Logical Reasoning over Knowledge Graphs using Large Language Models [13.594992599230277]
Reasoning over knowledge graphs (KGs) is a challenging task that requires a deep understanding of the relationships between entities.
Current approaches rely on learning geometries to embed entities in vector space for logical query operations.
We propose a novel decoupled approach, Language-guided Abstract Reasoning over Knowledge graphs (LARK), that formulates complex KG reasoning as a combination of contextual KG search and logical query reasoning.
arXiv Detail & Related papers (2023-05-02T02:21:49Z) - Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs)
We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.