Towards Revealing the Mystery behind Chain of Thought: A Theoretical
Perspective
- URL: http://arxiv.org/abs/2305.15408v5
- Date: Sat, 23 Dec 2023 02:39:56 GMT
- Title: Towards Revealing the Mystery behind Chain of Thought: A Theoretical
Perspective
- Authors: Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang
- Abstract summary: Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs)
We show that CoT can handle a general class of decision-making problems known as Dynamic Programming.
- Score: 39.47116013338394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have discovered that Chain-of-Thought prompting (CoT) can
dramatically improve the performance of Large Language Models (LLMs),
particularly when dealing with complex tasks involving mathematics or
reasoning. Despite the enormous empirical success, the underlying mechanisms
behind CoT and how it unlocks the potential of LLMs remain elusive. In this
paper, we take a first step towards theoretically answering these questions.
Specifically, we examine the expressivity of LLMs with CoT in solving
fundamental mathematical and decision-making problems. By using circuit
complexity theory, we first give impossibility results showing that
bounded-depth Transformers are unable to directly produce correct answers for
basic arithmetic/equation tasks unless the model size grows super-polynomially
with respect to the input length. In contrast, we then prove by construction
that autoregressive Transformers of constant size suffice to solve both tasks
by generating CoT derivations using a commonly used math language format.
Moreover, we show LLMs with CoT can handle a general class of decision-making
problems known as Dynamic Programming, thus justifying its power in tackling
complex real-world tasks. Finally, an extensive set of experiments show that,
while Transformers always fail to directly predict the answers, they can
consistently learn to generate correct solutions step-by-step given sufficient
CoT demonstrations.
Related papers
- Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities.
This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z) - Ask, and it shall be given: Turing completeness of prompting [47.08833920586575]
Large language models (LLMs) have been revolutionizing machine learning and initiated the so-called LLM prompting paradigm.
Here, we present the first theoretical study on the LLM prompting paradigm to the best of our knowledge.
We show that prompting is Turing-complete: there exists a finite-size Transformer such that for any computable function, there exists a corresponding prompt following which the Transformer computes the function.
arXiv Detail & Related papers (2024-11-04T11:26:38Z) - Supervised Chain of Thought [5.389461633686935]
Chain of Thought (CoT) prompting offers a promising approach to solving complex reasoning tasks.
One-prompt-for-all approach poses significant challenges for models to generate the correct reasoning steps.
We show how task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance.
arXiv Detail & Related papers (2024-10-18T06:25:27Z) - Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps.
We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
arXiv Detail & Related papers (2024-10-03T03:12:51Z) - Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z) - On Limitations of the Transformer Architecture [15.329285967441372]
We show that the Transformer layer is incapable of composing functions if the domains of the functions are large enough.
We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers.
arXiv Detail & Related papers (2024-02-13T01:52:15Z) - Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.