Dynamic Early Exit in Reasoning Models
- URL: http://arxiv.org/abs/2504.15895v1
- Date: Tue, 22 Apr 2025 13:36:53 GMT
- Title: Dynamic Early Exit in Reasoning Models
- Authors: Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Zheng Lin, Li Cao, Weiping Wang,
- Abstract summary: Overthinking in long chain-of-thought (CoT) generation slows down the efficiency of problem solving.<n>We propose a method that allows LLMs to self-truncate CoT sequences by early exit during generation.<n>Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs.
- Score: 14.508648537186989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large reasoning language models (LRLMs) rely on test-time scaling, which extends long chain-of-thought (CoT) generation to solve complex tasks. However, overthinking in long CoT not only slows down the efficiency of problem solving, but also risks accuracy loss due to the extremely detailed or redundant reasoning steps. We propose a simple yet effective method that allows LLMs to self-truncate CoT sequences by early exit during generation. Instead of relying on fixed heuristics, the proposed method monitors model behavior at potential reasoning transition points (e.g.,"Wait" tokens) and dynamically terminates the next reasoning chain's generation when the model exhibits high confidence in a trial answer. Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs. Experiments on multiple reasoning benchmarks MATH-500, AMC 2023, GPQA Diamond and AIME 2024 show that the proposed method is consistently effective on deepseek-series reasoning LLMs, reducing the length of CoT sequences by an average of 31% to 43% while improving accuracy by 1.7% to 5.7%.
Related papers
- ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning [1.170732359523702]
Reasoning models such as OpenAI o3 and DeepSeek-R1 have demonstrated strong performance on reasoning-intensive tasks.
Long reasoning traces can facilitate a more thorough exploration of solution paths for complex problems.
We introduce ShorterBetter, a simple yet effective reinforcement learning methed that enables reasoning language models to discover their own optimal CoT lengths.
arXiv Detail & Related papers (2025-04-30T07:04:19Z) - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.
We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.
For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z) - The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.
UPFT removes the need for labeled data or exhaustive sampling.
Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - Self-Training Elicits Concise Reasoning in Large Language Models [23.475414693530965]
Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens.<n>We propose simple fine-tuning methods which leverage self-generated concise reasoning paths.<n>Our method achieves a 30% reduction in output tokens, across five model families on GSM8K and MATH, while maintaining average accuracy.
arXiv Detail & Related papers (2025-02-27T14:14:50Z) - When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs)<n>However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?<n>In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
arXiv Detail & Related papers (2025-02-11T05:28:59Z) - O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [98.3430004984531]
We propose Length-Harmonizing Fine-Tuning (O1-Pruner) to minimize reasoning overhead while maintaining accuracy.<n>Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner.
arXiv Detail & Related papers (2025-01-22T01:35:11Z) - Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding [74.31981011985681]
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps.
We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution.
We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures.
arXiv Detail & Related papers (2024-11-06T22:02:30Z) - Improve Mathematical Reasoning in Language Models by Automated Process Supervision [23.807288360423193]
We propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named textitOmegaPRM for the efficient collection of high-quality process supervision data.<n>We are able to collect over 1.5 million process supervision annotations to train Process Reward Models (PRMs)<n>This fully automated process supervision alongside the weighted self-consistency algorithm is able to enhance LLMs' math reasoning performances.
arXiv Detail & Related papers (2024-06-05T19:25:40Z) - Chain of Evidences and Evidence to Generate: Prompting for Context Grounded and Retrieval Augmented Reasoning [3.117335706912261]
Chain of Evidences (CoE) and Evidence to Generate (E2G) are built upon two unique strategies.<n>Instead of unverified reasoning claims, our innovative approaches leverage the power of "evidence for decision making"<n>Our framework consistently achieves remarkable results across various knowledge-intensive reasoning and generation tasks.
arXiv Detail & Related papers (2024-01-11T09:49:15Z) - Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models [73.4425450752596]
Chain-of-thought (CoT) prompting has impressively unlocked the reasoning potential of large language models (LLMs)
Yet, the standard CoT is less effective in problems demanding multiple reasoning steps.
We propose RESPROMPT, a new prompting strategy that advances multi-step reasoning in LLMs.
arXiv Detail & Related papers (2023-10-07T08:56:28Z) - Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.