Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
- URL: http://arxiv.org/abs/2602.06600v1
- Date: Fri, 06 Feb 2026 10:53:26 GMT
- Title: Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
- Authors: Zhuoyuan Hao, Zhuo Li, Wu Li, Fangming Liu, Min Zhang, Jing Li,
- Abstract summary: Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning.<n>We analyze and harness the model's tendency to restate the question, which we term the emphEcho of Prompt (EOP), as a front-loaded, compute-shaping mechanism.
- Score: 25.852162778115808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the \emph{spontaneous} repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the \emph{Echo of Prompt (EOP)}, as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the \emph{Echo Likelihood Gap} $Δ\mathcal{L}$ as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop \emph{Echo-Distilled SFT (ED-SFT)} to instill an ``echo-then-reason'' pattern through supervised finetuning, and \emph{Echoic Prompting (EP)} to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an \emph{attention refocusing} mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.
Related papers
- $\
abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z) - Learning to Forget Attention: Memory Consolidation for Adaptive Compute Reduction [6.908972852063454]
Hybrid architectures combining state-space models with attention have achieved strong efficiency-quality tradeoffs.<n>We present a surprising finding from analyzing GPT-2 models: textbf88% of attention operations retrieve information already predictable from the model's hidden state.<n>We introduce textbfours (textbfConsolidation-based textbfRouting for textbfAdaptive textbfMemory), a biologically inspired memory consolidation mechanism that gradually distills episodic retrievals into parametric semantic memory
arXiv Detail & Related papers (2026-02-12T17:40:15Z) - Thinking Traps in Long Chain-of-Thought: A Measurable Study and Trap-Aware Adaptive Restart [27.904791075662896]
We introduce TAAR (Trap-Aware Adaptive Restart), a test-time control framework that trains a diagnostic policy to predict two signals from partial trajectories.<n>At inference time, TAAR truncates the trajectory before the predicted trap segment and adaptively restarts decoding.<n>Experiments show that TAAR improves reasoning performance without fine-tuning base model parameters.
arXiv Detail & Related papers (2026-01-17T07:26:02Z) - One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
arXiv Detail & Related papers (2025-10-12T07:42:57Z) - Thinking Before You Speak: A Proactive Test-time Scaling Approach [54.8205006555199]
We implement our idea as a reasoning framework, named emphThinking Before You Speak (TBYS)<n>We design a pipeline for automatically collecting and filtering in-context examples for the generation of emphinsights.<n>Experiments on challenging mathematical datasets verify the effectiveness of TBYS.
arXiv Detail & Related papers (2025-08-26T03:43:32Z) - Reliability, Embeddedness, and Agency: A Utility-Driven Mathematical Framework for Agent-Centric AI Adoption [0.0]
We formalize three axioms for sustained adoption of agent-centric AI systems executing multi-step tasks.<n>We model adoption as a sum of a decaying novelty term and a growing utility term.
arXiv Detail & Related papers (2025-08-18T12:53:38Z) - What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding [84.42056293290015]
We analyze the token-level misalignment between reasoning and non-reasoning models.<n>Motivated by the Local Misalignment Diminish, we propose FoReaL-Decoding.<n>On four popular math-reasoning benchmarks, FoReaL-Decoding reduces theoretical FLOPs by 30 to 50% and trims CoT length by up to 40%.
arXiv Detail & Related papers (2025-06-08T05:08:32Z) - Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z) - Language Model Uncertainty Quantification with Attention Chain [9.093726246465117]
Large language models' (LLM) predictive uncertainty is crucial for judging the reliability of its answers.<n>We propose UQAC, an efficient method that narrows the reasoning space to a tractable size for marginalization.<n>We validate UQAC on multiple reasoning benchmarks with advanced open-source LLMs.
arXiv Detail & Related papers (2025-03-24T21:43:47Z) - Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling.<n>Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification.<n>No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [47.46564769245296]
We introduce a novel approach for traversing the problem space using task decompositions.<n>We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.<n>Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Training Chain-of-Thought via Latent-Variable Inference [30.21067593018967]
Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a chain-of-thought'' prompt.
Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers.
We propose a fine-tuning strategy that tries to maximize the emphmarginal log-likelihood of generating a correct answer using CoT prompting.
arXiv Detail & Related papers (2023-11-28T17:47:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.