To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
- URL: http://arxiv.org/abs/2505.19245v1
- Date: Sun, 25 May 2025 17:49:37 GMT
- Title: To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
- Authors: Kevin Xu, Issei Sato,
- Abstract summary: Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks.<n>We provide a formal analysis of their respective strengths and limitations.
- Score: 32.01426831450348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks and to theoretically enhance expressivity by recursively increasing the number of computational steps. However, their comparative capabilities are still not well understood. In this paper, we provide a formal analysis of their respective strengths and limitations. We show that Looped Transformers can efficiently simulate parallel computations for deterministic tasks, which we formalize as evaluation over directed acyclic graphs. In contrast, CoT with stochastic decoding excels at approximate inference for compositional structures, namely self-reducible problems. These separations suggest the tasks for which depth-driven recursion is more suitable, thereby offering practical cues for choosing between reasoning paradigms.
Related papers
- Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought [56.71873693264532]
We prove that a two-layer transformer with $D$ steps of continuous CoTs can solve the directed graph reachability problem.<n>In our construction, each continuous thought vector is a superposition state that encodes multiple search frontiers simultaneously.
arXiv Detail & Related papers (2025-05-18T18:36:53Z) - Dynamic Parallel Tree Search for Efficient LLM Reasoning [102.16694475391665]
Tree of Thoughts (ToT) enhances Large Language Model (LLM) reasoning by structuring problem-solving as a spanning tree.<n>We propose Dynamic Parallel Tree Search (DPTS), a novel parallelism framework that aims to dynamically optimize the reasoning path in inference.<n> Experiments on Qwen-2.5 and Llama-3 with Math500 and GSM8K datasets show that DPTS significantly improves efficiency by 2-4x on average.
arXiv Detail & Related papers (2025-02-22T14:13:37Z) - Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning [47.06427150903487]
Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities.<n>Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions.<n>We propose RELAY to better leverage the strengths of Looped Transformers.
arXiv Detail & Related papers (2025-02-12T15:17:04Z) - Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures [0.0]
We introduce Adaptive Graph of Thoughts (AGoT), a dynamic, graph-based inference framework.<n>AGoT enhances Large Language Models (LLMs) reasoning solely at test time.<n>We validate our approach on diverse benchmarks spanning multi-hop retrieval, scientific reasoning, and mathematical problem-solving.
arXiv Detail & Related papers (2025-02-07T16:54:19Z) - Pattern Tree: Enhancing Efficiency in Quantum Circuit Optimization Based on Pattern-matching [3.2801774304960447]
We propose a novel framework for quantum circuit optimization based on pattern matching to enhance its efficiency.<n>We show that pattern-tree-based pattern matching can reduce execution time by an average of 20% on a well-accepted benchmark set.
arXiv Detail & Related papers (2024-12-09T07:21:11Z) - Transformers Provably Solve Parity Efficiently with Chain of Thought [40.78854925996]
This work provides the first theoretical analysis of training transformers to solve complex problems.<n>We consider training a one-layer transformer to solve the fundamental $k$-parity problem.
arXiv Detail & Related papers (2024-10-11T08:55:17Z) - On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding [32.01426831450348]
We establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions.<n>Experiments validate the theoretical results, showing that increasing the number of loops enhances performance.
arXiv Detail & Related papers (2024-10-02T10:31:17Z) - Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Tractable Bounding of Counterfactual Queries by Knowledge Compilation [51.47174989680976]
We discuss the problem of bounding partially identifiable queries, such as counterfactuals, in Pearlian structural causal models.
A recently proposed iterated EM scheme yields an inner approximation of those bounds by sampling the initialisation parameters.
We show how a single symbolic knowledge compilation allows us to obtain the circuit structure with symbolic parameters to be replaced by their actual values.
arXiv Detail & Related papers (2023-10-05T07:10:40Z) - Multi-task Learning of Order-Consistent Causal Graphs [59.9575145128345]
We consider the problem of discovering $K related Gaussian acyclic graphs (DAGs)
Under multi-task learning setting, we propose a $l_1/l$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models.
We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order.
arXiv Detail & Related papers (2021-11-03T22:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.