Related papers: SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation

SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation

URL: http://arxiv.org/abs/2601.03649v1
Date: Wed, 07 Jan 2026 07:00:15 GMT
Title: SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation
Authors: Gengyang Li, Wang Cai, Yifeng Gao, Yunfang Wu,
Abstract summary: We present SyncThink, a training-free and plug-and-play decoding method that reduces Chain-of-Thought overhead without modifying model weights.<n>We find that answer tokens attend weakly to early reasoning and instead focus on the special token "/think", indicating an information bottleneck.<n>Experiments on GSM8K, MMLU, GPQA, and BBH across three DeepSeek-R1 distilled models show that SyncThink achieves 62.00 percent average Top-1 accuracy.
Score: 11.021989271617835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) prompting improves reasoning but often produces long and redundant traces that substantially increase inference cost. We present SyncThink, a training-free and plug-and-play decoding method that reduces CoT overhead without modifying model weights. We find that answer tokens attend weakly to early reasoning and instead focus on the special token "/think", indicating an information bottleneck. Building on this observation, SyncThink monitors the model's own reasoning-transition signal and terminates reasoning. Experiments on GSM8K, MMLU, GPQA, and BBH across three DeepSeek-R1 distilled models show that SyncThink achieves 62.00 percent average Top-1 accuracy using 656 generated tokens and 28.68 s latency, compared to 61.22 percent, 2141 tokens, and 92.01 s for full CoT decoding. On long-horizon tasks such as GPQA, SyncThink can further yield up to +8.1 absolute accuracy by preventing over-thinking.

Related papers

Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens [12.788799173865]
We quantify inference-time effort by identifying deep-thinking tokens.<n>Think@n is a test-time scaling strategy that prioritizes samples with high deep-thinking ratios.
arXiv Detail & Related papers (2026-02-13T23:07:37Z)
Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning [57.57084309580296]
Thinking-Based Non-Thinking sets different maximum token usage for responses not using thinking across various queries.<n>Experiments on five mathematical benchmarks demonstrate that TNT reduces token usage by around 50%.<n>The probability of reward hacking problem in TNT's responses, which are classified as not using thinking, remains below 10%.
arXiv Detail & Related papers (2026-01-08T10:38:41Z)
Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning [11.179446105672461]
We propose a multi-stage efficient reasoning method that combines supervised fine-tuning and reinforcement learning.<n>Our approach reduces response length by an average of 28% for 8B models and 40% for 32B models.<n>It achieves a superior trade-off compared to more complex state-of-the-art efficient reasoning methods.
arXiv Detail & Related papers (2026-01-06T12:31:51Z)
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning [15.597220136913258]
LYNX is an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions.<n>We train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks.
arXiv Detail & Related papers (2025-12-05T00:04:42Z)
Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning [0.0]
Chain-of-Thought (CoT) prompting is a key technique for enabling complex reasoning in large language models.<n>We introduce LEASH: Logit-Entropy Adaptive Stopping Heuristic, a training-free decoding algorithm that adaptively halts rationale generation.
arXiv Detail & Related papers (2025-11-06T18:43:16Z)
DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching [54.98126916293868]
Large Reasoning Models (LRMs) produce excessively long chain-of-thought traces that degrade accuracy.<n>We propose a model-agnostic decoding framework that sketches the reasoning space by branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path.<n>This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision.
arXiv Detail & Related papers (2025-11-01T17:41:28Z)
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning [134.03095505580276]
Doing Length pEnalty Right (DLER) is a training recipe combining batch-wise reward normalization, higher clipping, dynamic sampling, and a simple truncation length penalty.<n>DLER achieves state-of-the-art accuracy--efficiency trade-offs, cutting output length by over 70 percent while surpassing all previous baseline accuracy.
arXiv Detail & Related papers (2025-10-16T20:05:57Z)
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding [84.42056293290015]
We analyze the token-level misalignment between reasoning and non-reasoning models.<n>Motivated by the Local Misalignment Diminish, we propose FoReaL-Decoding.<n>On four popular math-reasoning benchmarks, FoReaL-Decoding reduces theoretical FLOPs by 30 to 50% and trims CoT length by up to 40%.
arXiv Detail & Related papers (2025-06-08T05:08:32Z)
The Price of a Second Thought: On the Evaluation of Reasoning Efficiency in Large Language Models [54.88805865447848]
We show that instruct models achieve higher efficiency overall, and problem difficulty affects efficiency.<n>We propose COTHINK, a simple two-stage pipeline: an instruct model drafts a brief outline, and a thinking model expands it.<n>On GSM8K, MATH500, and AIME24, COTHINK cuts token usage by 21.1% while keeping accuracy on four thinking models, and remains competitive with strong efficiency baselines.
arXiv Detail & Related papers (2025-05-28T06:24:45Z)
VeriThinker: Learning to Verify Makes Reasoning Model Efficient [52.74493506816969]
Large Reasoning Models excel at complex tasks using Chain-of-Thought (CoT) reasoning.<n>Their tendency to overthinking leads to unnecessarily lengthy reasoning chains.<n>We introduce VeriThinker, a novel approach for CoT compression.
arXiv Detail & Related papers (2025-05-23T14:17:56Z)
Not All Tokens Are What You Need In Thinking [34.767739567093656]
Conditional Token Selection (CTS) identifies and preserves only the most essential tokens in chains of thought.<n>CTS effectively compresses long CoT while maintaining strong reasoning performance.<n>Further reducing training tokens by 42% incurs only a marginal 5% accuracy drop while yielding a 75.8% reduction in reasoning tokens.
arXiv Detail & Related papers (2025-05-23T12:41:29Z)
CoT-Valve: Length-Compressible Chain-of-Thought Tuning [50.196317781229496]
We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths.<n>We show that CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control.
arXiv Detail & Related papers (2025-02-13T18:52:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.