On the Empirical Complexity of Reasoning and Planning in LLMs
- URL: http://arxiv.org/abs/2404.11041v2
- Date: Tue, 18 Jun 2024 02:03:35 GMT
- Title: On the Empirical Complexity of Reasoning and Planning in LLMs
- Authors: Liwei Kang, Zirui Zhao, David Hsu, Wee Sun Lee,
- Abstract summary: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs)
This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning.
- Score: 29.588100727466976
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimented with 6 reasoning tasks, ranging from grade school math, air travel planning, ..., to Blocksworld. The results suggest that (i) both CoT and ToT benefit significantly from task decomposition, which breaks a complex reasoning task into a sequence of steps with low sample complexity and explicitly outlines the reasoning structure, and (ii) for computationally hard reasoning tasks, the more sophisticated tree structure of ToT outperforms the linear structure of CoT. These findings provide useful guidelines for the use of LLM in solving reasoning tasks in practice.
Related papers
- When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs)
However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?
In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
arXiv Detail & Related papers (2025-02-11T05:28:59Z) - Code Simulation as a Proxy for High-order Tasks in Large Language Models [6.71786454125056]
We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM)
We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks.
Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.
arXiv Detail & Related papers (2025-02-05T19:30:28Z) - Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory [15.24542569393982]
Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition.
We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks.
We highlight the need for innovative solutions to achieve reliable multi-step reasoning and compositional task-solving.
arXiv Detail & Related papers (2024-05-26T19:33:23Z) - Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level.
We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs)
We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.