Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
- URL: http://arxiv.org/abs/2505.21825v1
- Date: Tue, 27 May 2025 23:23:34 GMT
- Title: Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
- Authors: Parsa Mirtaheri, Ezra Edelman, Samy Jelassi, Eran Malach, Enric Boix-Adsera,
- Abstract summary: We show the existence of reasoning settings where sequential scaling offers an exponential advantage over parallel scaling.<n>We validate our theoretical findings with comprehensive experiments across a range of language models.
- Score: 19.390296419822562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inference-time computation has emerged as a promising scaling axis for improving large language model reasoning. However, despite yielding impressive performance, the optimal allocation of inference-time computation remains poorly understood. A central question is whether to prioritize sequential scaling (e.g., longer chains of thought) or parallel scaling (e.g., majority voting across multiple short chains of thought). In this work, we seek to illuminate the landscape of test-time scaling by demonstrating the existence of reasoning settings where sequential scaling offers an exponential advantage over parallel scaling. These settings are based on graph connectivity problems in challenging distributions of graphs. We validate our theoretical findings with comprehensive experiments across a range of language models, including models trained from scratch for graph connectivity with different chain of thought strategies as well as large reasoning models.
Related papers
- Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning [89.17086632436363]
We introduce a synthetic multihop reasoning environment designed to replicate the structure and distribution of real-world large-scale knowledge graphs.<n>Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios.<n>To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size.
arXiv Detail & Related papers (2025-04-04T17:57:22Z) - Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models [42.407188124841234]
Landscape of thoughts is a tool to inspect the reasoning paths of chain-of-thought on any multi-choice dataset.<n>It distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks.<n>It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty.
arXiv Detail & Related papers (2025-03-28T06:09:51Z) - Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning [113.49074603075032]
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks.<n>We explore whether scaling with longer CoTs can indeed impair the reasoning performance of Large Language Models (LLMs) in certain domains.
arXiv Detail & Related papers (2025-02-25T10:48:05Z) - How Do Large Language Monkeys Get Their Power (Laws)? [20.245443422985154]
We identify an apparent puzzle: a simple mathematical calculation predicts that on each problem, the failure rate should fall exponentially with the number of attempts.<n>We then answer this question by demonstrating per-problem exponential scaling can be made consistent with aggregate scaling.<n>Our work contributes to a better understanding of how neural language model performance improves with scaling inference compute.
arXiv Detail & Related papers (2025-02-24T19:01:47Z) - Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? [61.85289698610747]
We study whether o1-like large language models (LLMs) truly possess test-time scaling capabilities.<n>We find that longer CoTs of these o1-like models do not consistently enhance accuracy.<n>We propose Shortest Majority Vote, a method that combines parallel scaling strategies with CoT length characteristics.
arXiv Detail & Related papers (2025-02-17T07:21:11Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models [74.40196814292426]
We propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph.
GoT captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes.
We evaluate GoT's performance on a text-only reasoning task and a multimodal reasoning task.
arXiv Detail & Related papers (2023-05-26T02:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.