Oops, Wait: Token-Level Signals as a Lens into LLM Reasoning
- URL: http://arxiv.org/abs/2601.17421v1
- Date: Sat, 24 Jan 2026 11:43:09 GMT
- Title: Oops, Wait: Token-Level Signals as a Lens into LLM Reasoning
- Authors: Jaehui Hwang, Dongyoon Han, Sangdoo Yun, Byeongho Heo,
- Abstract summary: discourse-like tokens such as "wait" and "therefore" in large language models (LLMs) have offered a unique window into their reasoning processes.<n>We analyze token-level signals through token probabilities across various models.
- Score: 61.76889440384448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of discourse-like tokens such as "wait" and "therefore" in large language models (LLMs) has offered a unique window into their reasoning processes. However, systematic analyses of how such signals vary across training strategies and model scales remain lacking. In this paper, we analyze token-level signals through token probabilities across various models. We find that specific tokens strongly correlate with reasoning correctness, varying with training strategies while remaining stable across model scales. A closer look at the "wait" token in relation to answer probability demonstrates that models fine-tuned on small-scale datasets acquire reasoning ability through such signals but exploit them only partially. This work provides a systematic lens to observe and understand the dynamics of LLM reasoning.
Related papers
- How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z) - Modeling Hierarchical Thinking in Large Reasoning Models [2.429493364781869]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities when they generate step-by-step solutions.<n>When trained to using chain-of-thought reasoning examples, the resulting models appear to learn hierarchical thinking strategies similar to those used by humans.<n>In this paper, we adopt a memoryless Finite State Machine formulation to approximate LRM's emerging hierarchical reasoning dynamics as a structured, interpretable abstraction.
arXiv Detail & Related papers (2025-10-25T21:25:30Z) - MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts [82.46857666702924]
We present a new paradigm for reasoning in large language models (LLMs)<n>Instead of autoregressively generating tokens, we model reasoning as a hidden Markov chain of continuous, high-dimensional "thoughts"<n>For the first time, MARCOS achieves performance comparable to token-based CoT, even surpassing it by 4.7% on GSM8K with up to 15.7x speedup in inference.
arXiv Detail & Related papers (2025-09-29T16:44:22Z) - Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation [85.82112629564942]
We propose TokenBridge, which maintains the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens.<n>We introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism.<n>Our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction.
arXiv Detail & Related papers (2025-03-20T17:59:59Z) - I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - Self-supervised Analogical Learning using Language Models [59.64260218737556]
We propose SAL, a self-supervised analogical learning framework.<n> SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions.<n>We show that the resulting models outperform base language models on a wide range of reasoning benchmarks.
arXiv Detail & Related papers (2025-02-03T02:31:26Z) - Forking Paths in Neural Text Generation [14.75166317633176]
We develop a novel approach to representing uncertainty dynamics across individual tokens of text generation.<n>We use our method to analyze LLM responses on 7 different tasks across 4 domains.<n>We find many examples of forking tokens, including surprising ones such as punctuation marks.
arXiv Detail & Related papers (2024-12-10T22:57:57Z) - Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.<n>We present a novel framework for identifying these tokens through rollout sampling.<n>We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z) - LLMs are Not Just Next Token Predictors [0.0]
LLMs are statistical models of language learning through gradient descent with a next token prediction objective.
While LLMs are engineered using next token prediction, and trained based on their success at this task, our view is that a reduction to just next token predictor sells LLMs short.
In order to draw this out, we will make an analogy with a once prominent research program in biology explaining evolution and development from the gene's eye view.
arXiv Detail & Related papers (2024-08-06T16:36:28Z) - Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning [8.609587510471943]
We introduce a novel and interpretable analysis of internal multi-hop reasoning processes in large language models.
We show that during inference, the middle layers of the network generate highly interpretable embeddings.
Our findings can help uncover the strategies that LLMs use to solve reasoning tasks, offering insights into the types of thought processes that can emerge from artificial intelligence.
arXiv Detail & Related papers (2024-06-19T21:36:40Z) - Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models [4.165536532090932]
The disconnect between tokenizer creation and model training in language models allows for specific inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted model behaviour.
We present a comprehensive analysis of Large Language Model tokenizers, specifically targeting this issue of detecting under-trained tokens.
Through a combination of tokenizer analysis, model weight-based indicators, and prompting techniques, we develop novel and effective methods for automatically detecting these problematic tokens.
arXiv Detail & Related papers (2024-05-08T20:37:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.