Related papers: VeriThinker: Learning to Verify Makes Reasoning Model Efficient

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

URL: http://arxiv.org/abs/2505.17941v1
Date: Fri, 23 May 2025 14:17:56 GMT
Title: VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Authors: Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang,
Abstract summary: Large Reasoning Models excel at complex tasks using Chain-of-Thought (CoT) reasoning.<n>Their tendency to overthinking leads to unnecessarily lengthy reasoning chains.<n>We introduce VeriThinker, a novel approach for CoT compression.
Score: 52.74493506816969
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought (CoT) reasoning. However, their tendency to overthinking leads to unnecessarily lengthy reasoning chains, dramatically increasing inference costs. To mitigate this issue, we introduce VeriThinker, a novel approach for CoT compression. Unlike conventional methods that fine-tune LRMs directly on the original reasoning task using synthetic concise CoT data, we innovatively fine-tune the model solely through an auxiliary verification task. By training LRMs to accurately verify the correctness of CoT solutions, the LRMs inherently become more discerning about the necessity of subsequent self-reflection steps, thereby effectively suppressing overthinking. Extensive experiments validate that VeriThinker substantially reduces reasoning chain lengths while maintaining or even slightly improving accuracy. When applied to DeepSeek-R1-Distill-Qwen-7B, our approach reduces reasoning tokens on MATH500 from 3790 to 2125 while improving accuracy by 0.8% (94.0% to 94.8%), and on AIME25, tokens decrease from 14321 to 10287 with a 2.1% accuracy gain (38.7% to 40.8%). Additionally, our experiments demonstrate that VeriThinker can also be zero-shot generalized to speculative reasoning. Code is available at https://github.com/czg1225/VeriThinker

Related papers

ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [53.149817480019834]
Recent advancements in large reasoning models (LRMs) have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT)<n>We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint during the token generation of the reasoning process.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well.
arXiv Detail & Related papers (2025-06-23T16:20:44Z)
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding [84.42056293290015]
We analyze the token-level misalignment between reasoning and non-reasoning models.<n>Motivated by the Local Misalignment Diminish, we propose FoReaL-Decoding.<n>On four popular math-reasoning benchmarks, FoReaL-Decoding reduces theoretical FLOPs by 30 to 50% and trims CoT length by up to 40%.
arXiv Detail & Related papers (2025-06-08T05:08:32Z)
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
Large language models (LLMs) benefit from increased test-time compute, a phenomenon known as test-time scaling.<n>However, reasoning-optimized models often overthink even simple problems, producing excessively verbose outputs and leading to low token efficiency.<n>We identify two key causes of this verbosity: (1) reinforcement learning reduces the information density of forward reasoning, and (2) backward chain-of thought training encourages redundant and often unnecessary verification steps.
arXiv Detail & Related papers (2025-05-28T06:24:45Z)
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition [11.858707687894757]
Large Reasoning Models (LRMs) are criticized for the excessively lengthy Chain-of-Thought (CoT) to derive the final answer.<n>This paper introduces Multi-Turn Decomposition (MinD) to decode conventional CoT into a sequence of explicit, structured, and turn-wise interactions.<n>MinD can achieve up to 70% reduction in both output token usage and time to first token (TTFT)
arXiv Detail & Related papers (2025-05-26T10:18:57Z)
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting [28.537281448659634]
We propose a difficulty-aware prompting (DAP) method to dynamically shorten reasoning traces without performance loss.<n>In experiments, a student model fine-tuned on just 100K of these difficulty-pruned CoT samples outperforms a model distilled on 800K original Long CoT samples.<n>Our method also generalizes well: across 11 diverse benchmarks, the shorter difficulty-aware CoTs achieve equal or better accuracy than Long chains, using far fewer tokens.
arXiv Detail & Related papers (2025-05-26T09:04:44Z)
Not All Tokens Are What You Need In Thinking [34.767739567093656]
Conditional Token Selection (CTS) identifies and preserves only the most essential tokens in chains of thought.<n>CTS effectively compresses long CoT while maintaining strong reasoning performance.<n>Further reducing training tokens by 42% incurs only a marginal 5% accuracy drop while yielding a 75.8% reduction in reasoning tokens.
arXiv Detail & Related papers (2025-05-23T12:41:29Z)
Process Reward Models That Think [86.88809596842428]
Step-by-step verifiers -- also known as process reward models (PRMs) -- are a key ingredient for test-time scaling.<n>This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT)<n>We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs.
arXiv Detail & Related papers (2025-04-23T15:44:54Z)
Dynamic Early Exit in Reasoning Models [13.982812528756504]
Overthinking in long chain-of-thought (CoT) generation slows down the efficiency of problem solving, but also risks accuracy loss.<n>We propose a simple yet effective method that allows LLMs to self-truncate CoT sequences by early exit during generation.<n>Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs.
arXiv Detail & Related papers (2025-04-22T13:36:53Z)
SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z)
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! [53.84130385074551]
Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT)<n>We find that a Large Language model (LLM) can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and parameter-efficient low-rank adaptation (LoRA)<n>With just 17k long CoT training samples, the Qwen2.5-32B-Instruct model achieves significant improvements on a wide range of math and coding benchmarks.
arXiv Detail & Related papers (2025-02-11T08:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.