Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models
- URL: http://arxiv.org/abs/2601.18383v1
- Date: Mon, 26 Jan 2026 11:31:40 GMT
- Title: Dynamic Thinking-Token Selection for Efficient Reasoning in Large Reasoning Models
- Authors: Zhenyuan Guo, Tong Chen, Wenlong Meng, Chen Gong, Xin Yu, Chengkun Wei, Wenzhi Chen,
- Abstract summary: Large Reasoning Models (LRMs) excel at solving complex problems by explicitly generating a reasoning trace before deriving the final answer.<n>However, extended generations incur substantial memory footprint and computational overhead, bottlenecking LRMs' efficiency.<n>This work uses attention maps to analyze the influence of reasoning traces and uncover an interesting phenomenon.<n>Only some decision-critical tokens in a reasoning trace steer the model toward the final answer, while the remaining tokens contribute negligibly.
- Score: 29.494777688316674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Reasoning Models (LRMs) excel at solving complex problems by explicitly generating a reasoning trace before deriving the final answer. However, these extended generations incur substantial memory footprint and computational overhead, bottlenecking LRMs' efficiency. This work uses attention maps to analyze the influence of reasoning traces and uncover an interesting phenomenon: only some decision-critical tokens in a reasoning trace steer the model toward the final answer, while the remaining tokens contribute negligibly. Building on this observation, we propose Dynamic Thinking-Token Selection (DynTS). This method identifies decision-critical tokens and retains only their associated Key-Value (KV) cache states during inference, evicting the remaining redundant entries to optimize efficiency.
Related papers
- Do LLMs Encode Functional Importance of Reasoning Tokens? [11.21558453188654]
We propose greedy pruning, a likelihood-preserving deletion procedure that iteratively removes reasoning tokens.<n>We show that students trained on pruned chains outperform a frontier-model-supervised compression baseline at matched reasoning lengths.
arXiv Detail & Related papers (2026-01-06T14:50:02Z) - Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs [8.359909829007005]
We investigate whether efficient reasoning strategies introduce behavioral inconsistencies in large reasoning models (LRMs)<n>$ICBENCH$ is a benchmark designed to measure inconsistency in LRMs across three dimensions.<n>We find that while larger models generally exhibit greater consistency than smaller ones, they all display widespread "scheming" behaviors.
arXiv Detail & Related papers (2025-06-24T10:25:28Z) - Think Clearly: Improving Reasoning via Redundant Token Pruning [57.01254508252785]
We show that deliberately removing redundancy in the reasoning process significantly improves performance.<n>We demonstrate that our method significantly improves overall accuracy across reasoning-intensive benchmarks without any training.
arXiv Detail & Related papers (2025-06-17T06:04:01Z) - On Reasoning Strength Planning in Large Reasoning Models [50.61816666920207]
We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation.<n>We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model.<n>Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors.
arXiv Detail & Related papers (2025-06-10T02:55:13Z) - PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z) - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling [39.61818305829112]
We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step.<n>PIR identifies and selectively prunes only low-importance functional steps while preserving progressive reasoning components.<n>Our approach demonstrates strong generalizability across different model sizes, data sources, and token budgets.
arXiv Detail & Related papers (2025-05-25T15:17:57Z) - Efficient Inference for Large Reasoning Models: A Survey [74.17203483365171]
Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason.<n>However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time.<n>This survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality.
arXiv Detail & Related papers (2025-03-29T13:27:46Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [49.61246073215651]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.<n>However, they also introduce significant computational overhead due to verbose and redundant outputs.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism [68.05754701230039]
We construct a symbolic multi-step reasoning task to investigate the information propagation mechanisms in Transformer models.<n>We propose a random matrix-based algorithm to enhance the model's reasoning ability.
arXiv Detail & Related papers (2024-05-24T07:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.