On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study
- URL: http://arxiv.org/abs/2507.05362v2
- Date: Sat, 01 Nov 2025 15:09:31 GMT
- Title: On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study
- Authors: Riccardo Alberghi, Elizaveta Demyanenko, Luca Biggio, Luca Saglietti,
- Abstract summary: We study two key factors for improving reasoning in large language models.<n>We train decoder-only transformers on question-trace-answer triples using a custom tokenizer.<n>With the same training-token budget, models trained on inefficient traces generalize better to unseen graphs.
- Score: 4.798155648915794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in natural language processing highlight two key factors for improving reasoning in large language models (LLMs): (i) allocating more test-time compute tends to help on harder problems but often introduces redundancy in the reasoning trace, and (ii) compute is most effective when reasoning is systematic and incremental, forming structured chains of thought (CoTs) akin to human problem-solving. To study these factors in isolation, we introduce a controlled setting based on shortest-path tasks in layered graphs. We train decoder-only transformers on question-trace-answer triples using a custom tokenizer, comparing models trained on optimal bottom-up dynamic programming traces with those trained on longer, valid traces involving backtracking. Surprisingly, with the same training-token budget, models trained on inefficient traces generalize better to unseen graphs. This benefit is not due to length alone-injecting arbitrary redundancy into reasoning traces fails to help and can even hurt performance. Instead, we find that generalization correlates with the model's confidence in next-token prediction, suggesting that long, coherent, and locally incremental traces make the training signal easier to optimize.
Related papers
- Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning [16.95900718416944]
We introduce a novel Bidirectional Curriculum Generation framework to maximize the instructional value of every training sample.<n>Unlike rigid trajectories, our multi-agent ecosystem mimics adaptive pedagogy to establish a closed feedback loop.<n>This mechanism ensures that the model consumes only the most effective data at any given stage.
arXiv Detail & Related papers (2026-03-05T12:49:21Z) - Constraint-Rectified Training for Efficient Chain-of-Thought [60.52883907721588]
Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs)<n>While longer reasoning traces can improve answer quality and unlock abilities such as self-correction, they also incur high inference costs and often introduce redundant steps, known as overthinking.<n>Recent research seeks to develop efficient reasoning strategies that balance reasoning length and accuracy.
arXiv Detail & Related papers (2026-02-13T02:13:45Z) - ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning [30.786062954495403]
Large Reasoning Models (LRMs) often suffer from overthinking, generating unnecessarily long reasoning chains even for simple tasks.<n>We propose ENTRA, an entropy-based training framework that suppresses redundant reasoning while preserving performance.
arXiv Detail & Related papers (2026-01-12T01:26:30Z) - Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training [76.12556589212666]
We show that curriculum post-training avoids the exponential complexity bottleneck.<n>Under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with sample complexity.<n>We establish guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to order.
arXiv Detail & Related papers (2025-11-10T18:29:54Z) - Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search [62.1546099504045]
We propose a dual-phase test-time scaling framework that separates reasoning into planning and execution.<n>Specifically, we decompose reasoning trajectories and develop reward models for each phase, enabling the search to explore and prune plans and executions separately.<n> Experiments on both mathematical reasoning and code generation benchmarks demonstrate that our approach consistently improves accuracy while reducing computation redundant.
arXiv Detail & Related papers (2025-09-29T19:27:23Z) - Fast Thinking for Large Language Models [67.7238685892317]
We introduce Latent Codebooks for Fast Thinking, a framework that uses concise CoT sketches only during training to learn a codebook of discrete strategy priors.<n>At inference, the model conditions on a handful of continuous thinking switches distilled from the codebook in a single pass, enabling strategy-level guidance without producing explicit reasoning tokens.
arXiv Detail & Related papers (2025-09-28T04:19:48Z) - Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity [23.225139930889522]
This work critically examines whether intermediate token sequence length reflects or correlates with problem difficulty.<n>We train transformer models from scratch on derivational traces of the A* search algorithm.<n>We find that even for the simplest tasks, they often produce excessively long reasoning traces and sometimes fail to generate a solution.
arXiv Detail & Related papers (2025-09-09T02:31:16Z) - Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation [82.2288581878096]
We present a framework for difficulty-aware reasoning that teaches models to dynamically adjust reasoning depth based on problem complexity.<n>We show that models can be endowed with such dynamic inference pathways without any architectural modifications.
arXiv Detail & Related papers (2025-09-05T16:40:13Z) - Multipole Attention for Efficient Long Context Reasoning [64.94673641704289]
Large Reasoning Models (LRMs) have shown promising accuracy improvements on complex problem-solving tasks.<n>LRMs need to generate long chain-of-thought reasoning in order to think before answering.<n>We introduce Multipole Attention, which accelerates autoregressive reasoning by only computing exact attention for the most important tokens.
arXiv Detail & Related papers (2025-06-16T03:00:40Z) - Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation [14.489157453882767]
This work aims to address the challenge of evaluating reasoning traces and their correlation with the final performance.<n>We employ a KD method leveraging rule-based problem decomposition to generate interpretable traces.<n>Specifically, we demonstrate this approach on Open Book QA, decomposing the problem into a Classification step and an Information Retrieval step.
arXiv Detail & Related papers (2025-05-20T00:49:19Z) - Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens [14.78605805191225]
We investigate how the semantics of intermediate tokens-often anthropomorphized as "thoughts" or reasoning traces-actually influence model performance.<n>We show that despite significant improvements on the solution-only baseline, models trained on entirely correct traces still produce invalid reasoning traces when arriving at correct solutions.
arXiv Detail & Related papers (2025-05-19T23:29:23Z) - Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning [1.0416697066889342]
We propose a simple yet effective reinforcement learning method that enables reasoning models to learn their own optimal CoT lengths without manual supervision.<n>ShorterBetter achieves 50%-80% reduction in output lengths in both in-domain and out-of-domain reasoning tasks.<n>Our reasoning trace analysis shows that ShorterBetter refines the structure of the reasoning traces by reducing unnecessary repetition, excessive self-verification, and over-exploration of alternatives.
arXiv Detail & Related papers (2025-04-30T07:04:19Z) - To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning [31.21491548356213]
Backtracking naturally scales test-time compute by enabling sequential, linearized exploration via long chain-of-thought (CoT) generation.<n>Despite the growing adoption of sequential search, its advantages over parallel sampling remain poorly understood.<n>We show that models with backtracking capabilities benefit significantly from RL fine-tuning, while models without backtracking see limited, mixed gains.
arXiv Detail & Related papers (2025-04-09T17:12:49Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 78% with minimal accuracy loss across 15 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z) - The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - Short-Term Memory Optimization in Recurrent Neural Networks by
Autoencoder-based Initialization [79.42778415729475]
We explore an alternative solution based on explicit memorization using linear autoencoders for sequences.
We show how such pretraining can better support solving hard classification tasks with long sequences.
We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
arXiv Detail & Related papers (2020-11-05T14:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.