Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
- URL: http://arxiv.org/abs/2506.08343v2
- Date: Wed, 18 Jun 2025 14:43:36 GMT
- Title: Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
- Authors: Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou,
- Abstract summary: Explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning.<n>We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference.
- Score: 24.56015832583054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.
Related papers
- Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model [7.8354921036790275]
Large Reasoning Models (LRMs) excel at solving complex problems but face an overthinking dilemma.<n>When handling simple tasks, they often produce verbose responses overloaded with thinking tokens.<n>These tokens trigger unnecessary high-level reasoning behaviors like reflection and backtracking, reducing efficiency.
arXiv Detail & Related papers (2025-06-30T13:30:33Z) - ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [53.149817480019834]
Recent advancements in large reasoning models (LRMs) have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT)<n>We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint during the token generation of the reasoning process.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well.
arXiv Detail & Related papers (2025-06-23T16:20:44Z) - Think Clearly: Improving Reasoning via Redundant Token Pruning [57.01254508252785]
We show that deliberately removing redundancy in the reasoning process significantly improves performance.<n>We demonstrate that our method significantly improves overall accuracy across reasoning-intensive benchmarks without any training.
arXiv Detail & Related papers (2025-06-17T06:04:01Z) - Multipole Attention for Efficient Long Context Reasoning [64.94673641704289]
Large Reasoning Models (LRMs) have shown promising accuracy improvements on complex problem-solving tasks.<n>LRMs need to generate long chain-of-thought reasoning in order to think before answering.<n>We introduce Multipole Attention, which accelerates autoregressive reasoning by only computing exact attention for the most important tokens.
arXiv Detail & Related papers (2025-06-16T03:00:40Z) - Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models [103.03315678501546]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z) - Answer Convergence as a Signal for Early Stopping in Reasoning [7.60104447055814]
Chain-of-thought (CoT) prompting enhances reasoning in large language models (LLMs)<n>We propose three inference-time strategies to improve efficiency: (1) early stopping via answer consistency, (2) boosting the probability of generating end-of-reasoning signals, and (3) a supervised method that learns when to stop based on internal activations.
arXiv Detail & Related papers (2025-06-03T07:20:54Z) - Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z) - CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
Large language models (LLMs) benefit from increased test-time compute, a phenomenon known as test-time scaling.<n>However, reasoning-optimized models often overthink even simple problems, producing excessively verbose outputs and leading to low token efficiency.<n>We identify two key causes of this verbosity: (1) reinforcement learning reduces the information density of forward reasoning, and (2) backward chain-of thought training encourages redundant and often unnecessary verification steps.
arXiv Detail & Related papers (2025-05-28T06:24:45Z) - Efficient Reasoning via Chain of Unconscious Thought [40.82356218832031]
Large Reasoning Models (LRMs) achieve promising performance but compromise token efficiency due to verbose reasoning processes.<n>We propose a new reasoning paradigm, termed Chain of Unconscious Thought (CoUT), to improve the token efficiency of LRMs.<n>Our work reveals that models may possess beneficial unconscious thought, enabling improved efficiency without sacrificing performance.
arXiv Detail & Related papers (2025-05-26T09:34:04Z) - Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z) - ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning [1.0416697066889342]
We propose a simple yet effective reinforcement learning method that enables reasoning models to learn their own optimal CoT lengths without manual supervision.<n>ShorterBetter achieves 50%-80% reduction in output lengths in both in-domain and out-of-domain reasoning tasks.<n>Our reasoning trace analysis shows that ShorterBetter refines the structure of the reasoning traces by reducing unnecessary repetition, excessive self-verification, and over-exploration of alternatives.
arXiv Detail & Related papers (2025-04-30T07:04:19Z) - LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception [105.78609483419115]
We introduce LongPerceptualThoughts, a new synthetic dataset with 30K long-thought traces for perceptual tasks.<n>We propose a novel three-stage data synthesis framework that first synthesizes verifiable multiple-choice questions.<n>We demonstrate notable improvements over existing visual reasoning data-generation methods.
arXiv Detail & Related papers (2025-04-21T18:10:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.