Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
        - URL: http://arxiv.org/abs/2305.15408v5
- Date: Sat, 23 Dec 2023 02:39:56 GMT
- Title: Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
- Authors: Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang
- Abstract summary: Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs)
We show that CoT can handle a general class of decision-making problems known as Dynamic Programming.
- Score: 39.47116013338394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Recent studies have discovered that Chain-of-Thought prompting (CoT) can
dramatically improve the performance of Large Language Models (LLMs),
particularly when dealing with complex tasks involving mathematics or
reasoning. Despite the enormous empirical success, the underlying mechanisms
behind CoT and how it unlocks the potential of LLMs remain elusive. In this
paper, we take a first step towards theoretically answering these questions.
Specifically, we examine the expressivity of LLMs with CoT in solving
fundamental mathematical and decision-making problems. By using circuit
complexity theory, we first give impossibility results showing that
bounded-depth Transformers are unable to directly produce correct answers for
basic arithmetic/equation tasks unless the model size grows super-polynomially
with respect to the input length. In contrast, we then prove by construction
that autoregressive Transformers of constant size suffice to solve both tasks
by generating CoT derivations using a commonly used math language format.
Moreover, we show LLMs with CoT can handle a general class of decision-making
problems known as Dynamic Programming, thus justifying its power in tackling
complex real-world tasks. Finally, an extensive set of experiments show that,
while Transformers always fail to directly predict the answers, they can
consistently learn to generate correct solutions step-by-step given sufficient
CoT demonstrations.
 
      
        Related papers
        - RL for Reasoning by Adaptively Revealing Rationales [36.50924054394857]
 Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows.<n>We address this by adaptive backtracking (AdaBack), a per-sample curriculum learning algorithm that reveals only a partial prefix of the target output during training.<n>We show that our adaptive curriculum over partial answers reliably solves problems that are otherwise intractable.
 arXiv  Detail & Related papers  (2025-06-22T17:46:14Z)
- Computational Thinking Reasoning in Large Language Models [69.28428524878885]
 Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
 arXiv  Detail & Related papers  (2025-06-03T09:11:15Z)
- Why Does Your CoT Prompt (Not) Work? Theoretical Analysis of Prompt   Space Complexity, its Interaction with Answer Space During CoT Reasoning with   LLMs: A Recurrent Perspective [15.941209553757274]
 Chain-of-Thought (CoT) prompting has emerged as a practical solution to the limitations of Large Language Models (LLMs)
This paper provides a rigorous theoretical analysis of the complexity and interplay between two crucial spaces: the prompt space and the answer space.
We show that sometimes human supervision is critical for efficiently navigating the prompt space.
 arXiv  Detail & Related papers  (2025-03-13T06:11:10Z)
- Towards Understanding Multi-Round Large Language Model Reasoning:   Approximability, Learnability and Generalizability [18.54202114336492]
 We investigate the approximation, learnability, and generalization properties of multi-round auto-regressive models.
We show that Transformers with finite context windows are universal approximators for steps of Turing-computable functions.
We extend PAC learning to sequence generation and demonstrate that multi-round generation is learnable even when the sequence length exceeds the model's context window.
 arXiv  Detail & Related papers  (2025-03-05T02:50:55Z)
- When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
 Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs)
However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?
In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
 arXiv  Detail & Related papers  (2025-02-11T05:28:59Z)
- Provably Transformers Harness Multi-Concept Word Semantics for Efficient   In-Context Learning [53.685764040547625]
 Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities.
This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
 arXiv  Detail & Related papers  (2024-11-04T15:54:32Z)
- Ask, and it shall be given: Turing completeness of prompting [47.08833920586575]
 Large language models (LLMs) have been revolutionizing machine learning and initiated the so-called LLM prompting paradigm.
Here, we present the first theoretical study on the LLM prompting paradigm to the best of our knowledge.
We show that prompting is Turing-complete: there exists a finite-size Transformer such that for any computable function, there exists a corresponding prompt following which the Transformer computes the function.
 arXiv  Detail & Related papers  (2024-11-04T11:26:38Z)
- Supervised Chain of Thought [5.389461633686935]
 Chain of Thought (CoT) prompting offers a promising approach to solving complex reasoning tasks.
One-prompt-for-all approach poses significant challenges for models to generate the correct reasoning steps.
We show how task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance.
 arXiv  Detail & Related papers  (2024-10-18T06:25:27Z)
- Training Nonlinear Transformers for Chain-of-Thought Inference: A   Theoretical Generalization Analysis [82.51626700527837]
 Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps.
We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
 arXiv  Detail & Related papers  (2024-10-03T03:12:51Z)
- Chain of Thought Empowers Transformers to Solve Inherently Serial   Problems [57.58801785642868]
 Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
 arXiv  Detail & Related papers  (2024-02-20T10:11:03Z)
- On Limitations of the Transformer Architecture [15.329285967441372]
 We show that the Transformer layer is incapable of composing functions if the domains of the functions are large enough.
We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers.
 arXiv  Detail & Related papers  (2024-02-13T01:52:15Z)
- Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
 We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
 arXiv  Detail & Related papers  (2023-05-29T23:24:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.