Related papers: On the Empirical Complexity of Reasoning and Planning in LLMs

On the Empirical Complexity of Reasoning and Planning in LLMs

URL: http://arxiv.org/abs/2404.11041v2
Date: Tue, 18 Jun 2024 02:03:35 GMT
Title: On the Empirical Complexity of Reasoning and Planning in LLMs
Authors: Liwei Kang, Zirui Zhao, David Hsu, Wee Sun Lee,
Abstract summary: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs) This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning.
Score: 29.588100727466976
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimented with 6 reasoning tasks, ranging from grade school math, air travel planning, ..., to Blocksworld. The results suggest that (i) both CoT and ToT benefit significantly from task decomposition, which breaks a complex reasoning task into a sequence of steps with low sample complexity and explicitly outlines the reasoning structure, and (ii) for computationally hard reasoning tasks, the more sophisticated tree structure of ToT outperforms the linear structure of CoT. These findings provide useful guidelines for the use of LLM in solving reasoning tasks in practice.

Related papers

Code Simulation as a Proxy for High-order Tasks in Large Language Models [6.71786454125056]
We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM) We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks. Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.
arXiv Detail & Related papers (2025-02-05T19:30:28Z)
Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question. To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA) Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z)
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs) We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z)
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory [15.24542569393982]
Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We highlight the need for innovative solutions to achieve reliable multi-step reasoning and compositional task-solving.
arXiv Detail & Related papers (2024-05-26T19:33:23Z)
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT. Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition. Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z)
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level. We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Complex Logical Reasoning over Knowledge Graphs using Large Language Models [13.594992599230277]
Reasoning over knowledge graphs (KGs) is a challenging task that requires a deep understanding of the relationships between entities. Current approaches rely on learning geometries to embed entities in vector space for logical query operations. We propose a novel decoupled approach, Language-guided Abstract Reasoning over Knowledge graphs (LARK), that formulates complex KG reasoning as a combination of contextual KG search and logical query reasoning.
arXiv Detail & Related papers (2023-05-02T02:21:49Z)
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs) We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.