Internal states before wait modulate reasoning patterns
- URL: http://arxiv.org/abs/2510.04128v1
- Date: Sun, 05 Oct 2025 10:03:42 GMT
- Title: Internal states before wait modulate reasoning patterns
- Authors: Dmitrii Troitskii, Koyena Pal, Chris Wendler, Callum Stuart McDougall, Neel Nanda,
- Abstract summary: We train crosscoders at multiple layers of DeepSeek-R1-Distill-Llama-8B and introduce a latent attribution technique in the crosscoder setting.<n>We locate a small set of features relevant for promoting/suppressing wait tokens' probabilities.<n>We show that many of our identified features indeed are relevant for the reasoning process.
- Score: 14.272989515787351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior work has shown that a significant driver of performance in reasoning models is their ability to reason and self-correct. A distinctive marker in these reasoning traces is the token wait, which often signals reasoning behavior such as backtracking. Despite being such a complex behavior, little is understood of exactly why models do or do not decide to reason in this particular manner, which limits our understanding of what makes a reasoning model so effective. In this work, we address the question whether model's latents preceding wait tokens contain relevant information for modulating the subsequent reasoning process. We train crosscoders at multiple layers of DeepSeek-R1-Distill-Llama-8B and its base version, and introduce a latent attribution technique in the crosscoder setting. We locate a small set of features relevant for promoting/suppressing wait tokens' probabilities. Finally, through a targeted series of experiments analyzing max activating examples and causal interventions, we show that many of our identified features indeed are relevant for the reasoning process and give rise to different types of reasoning patterns such as restarting from the beginning, recalling prior knowledge, expressing uncertainty, and double-checking.
Related papers
- Fluid Representations in Reasoning Models [91.77876704697779]
We present a mechanistic analysis of how QwQ-32B processes abstract structural information.<n>We find that QwQ-32B gradually improves its internal representation of actions and concepts during reasoning.
arXiv Detail & Related papers (2026-02-04T18:34:50Z) - Do LLMs Encode Functional Importance of Reasoning Tokens? [11.21558453188654]
We propose greedy pruning, a likelihood-preserving deletion procedure that iteratively removes reasoning tokens.<n>We show that students trained on pruned chains outperform a frontier-model-supervised compression baseline at matched reasoning lengths.
arXiv Detail & Related papers (2026-01-06T14:50:02Z) - Temporal Predictors of Outcome in Reasoning Language Models [0.0]
Chain-of-thought (CoT) paradigm uses the elicitation of step-by-step rationales as a proxy for reasoning.<n>We show that, for harder questions, a drop in predictive accuracy highlights a selection artifact.<n>Overall, our results imply that for reasoning models, internal self-assessment of success tends to emerge after only a few tokens.
arXiv Detail & Related papers (2025-11-03T08:57:18Z) - Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting [38.93424884988798]
We show that large reasoning models overthink, continuing to revise answers even after reaching the correct solution.<n>We propose Entropy After /Think> (EAT) for monitoring and deciding whether to exit reasoning early.<n>EAT reduces token usage by 13 - 21% without harming accuracy.
arXiv Detail & Related papers (2025-09-30T16:59:37Z) - From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models [48.01707022738742]
We conduct a three-stage investigation into the interplay between reasoning and answer generation in three distilled DeepSeek R1 models.<n>We demonstrate that including explicit reasoning consistently improves answer quality across diverse domains.<n>Our results show that perturbations to key reasoning tokens can reliably alter the final answers.
arXiv Detail & Related papers (2025-09-28T06:32:21Z) - Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling [60.63703438729223]
We show how different architectures and training methods affect model multi-step reasoning capabilities.<n>We confirm that increasing model depth plays a crucial role for sequential computations.
arXiv Detail & Related papers (2025-08-22T18:57:08Z) - Lost at the Beginning of Reasoning [85.17612793300238]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.
arXiv Detail & Related papers (2025-06-27T09:53:57Z) - Think Clearly: Improving Reasoning via Redundant Token Pruning [57.01254508252785]
We show that deliberately removing redundancy in the reasoning process significantly improves performance.<n>We demonstrate that our method significantly improves overall accuracy across reasoning-intensive benchmarks without any training.
arXiv Detail & Related papers (2025-06-17T06:04:01Z) - On Reasoning Strength Planning in Large Reasoning Models [50.61816666920207]
We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation.<n>We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model.<n>Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors.
arXiv Detail & Related papers (2025-06-10T02:55:13Z) - The Geometry of Self-Verification in a Task-Specific Reasoning Model [45.669264589017665]
We train a model using DeepSeek R1's recipe on the CountDown task.<n>We do top-down and bottom-up analyses to reverse-engineer how the model verifies its outputs.
arXiv Detail & Related papers (2025-04-19T18:40:51Z) - Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps [3.8936716676293917]
This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data.<n>We identify a critical parameter threshold (1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning.
arXiv Detail & Related papers (2025-02-21T00:48:32Z) - Exposing Attention Glitches with Flip-Flop Language Modeling [55.0688535574859]
This work identifies and analyzes the phenomenon of attention glitches in large language models.
We introduce flip-flop language modeling (FFLM), a family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models.
We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques.
arXiv Detail & Related papers (2023-06-01T17:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.