Related papers: Let LRMs Break Free from Overthinking via Self-Braking Tuning

Let LRMs Break Free from Overthinking via Self-Braking Tuning

URL: http://arxiv.org/abs/2505.14604v4
Date: Thu, 30 Oct 2025 02:36:10 GMT
Title: Let LRMs Break Free from Overthinking via Self-Braking Tuning
Authors: Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang,
Abstract summary: Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
Score: 68.93713497579853
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large reasoning models (LRMs), such as OpenAI o1 and DeepSeek-R1, have significantly enhanced their reasoning capabilities by generating longer chains of thought, demonstrating outstanding performance across a variety of tasks. However, this performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process, leading to high computational overhead and exacerbating the issue of overthinking. Although numerous existing approaches aim to address the problem of overthinking, they often rely on external interventions. In this paper, we propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process, thus eliminating the reliance on external control mechanisms. We construct a set of overthinking identification metrics based on standard answers and design a systematic method to detect redundant reasoning. This method accurately identifies unnecessary steps within the reasoning trajectory and generates training signals for learning self-regulation behaviors. Building on this foundation, we develop a complete strategy for constructing data with adaptive reasoning lengths and introduce an innovative braking prompt mechanism that enables the model to naturally learn when to terminate reasoning at an appropriate point. Experiments across mathematical benchmarks (AIME, AMC, MATH500, GSM8K) demonstrate that our method reduces token consumption by up to 60% while maintaining comparable accuracy to unconstrained models.

Related papers

Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs [46.272771457924186]
We propose textbfDraft-Thinking, which guides models to first learn a concise textitdraft-style reasoning structure that retains only the critical reasoning steps.<n>Experiments demonstrate that Draft-Thinking substantially reduces reasoning budget while largely preserving reasoning performance.
arXiv Detail & Related papers (2026-02-28T09:57:52Z)
Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning [62.680551162054975]
We introduce an end-to-end framework where LLMs learn to self-regulate the granularity of the reasoning steps through dynamic summarization.<n>We apply reinforcement learning to incentivize this capability further, uncovering a critical insight: the accuracy gap between the highly efficient Fold mode and the exhaustive Unfold mode progressively narrows.<n>Our Accordion-Thinker demonstrates that with learned self-compression, LLMs can tackle complex reasoning tasks with minimal dependency token overhead.
arXiv Detail & Related papers (2026-02-03T08:34:20Z)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
arXiv Detail & Related papers (2026-01-07T23:05:17Z)
Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation [82.62935304152239]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning.<n>They often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems.<n>We introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process.
arXiv Detail & Related papers (2025-10-02T17:36:50Z)
From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control [11.321315058502215]
Large Reasoning Models (LRMs) have demonstrated a latent capacity for complex reasoning by spontaneously exhibiting cognitive behaviors such as step-by-step reasoning, reflection, and backtracking, commonly referred to as "Aha Moments"<n>However, such emergent behaviors remain unregulated and uncontrolled, often resulting in overthinking, where the model continues generating redundant reasoning content even after reaching reliable conclusions.<n>Current models are unable to monitor and adaptively manage their reasoning process to determine when to continue, backtrack, or terminate.<n>We propose the Meta-cognitive Reasoning Framework (MERA), which explicitly decouples the thinking process into distinct
arXiv Detail & Related papers (2025-08-06T13:59:17Z)
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models [23.642200042199484]
We propose Thinking with Nothinking (JointThinking) as an in-context learning (ICL) paradigm for Reasoning large language models (RLLMs)<n>Our method prompts the model to generate two answers in parallel: one in Thinking mode and the other in Nothinking mode.<n>JointThinking significantly outperforms few-shot chain-of-thought robustness (CoT) and majority voting with improved answer.
arXiv Detail & Related papers (2025-08-05T12:09:55Z)
Lost at the Beginning of Reasoning [82.18834329384514]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.<n>We introduce a new benchmark specifically constructed with deliberately flawed first reasoning steps to systematically evaluate model self-correction capabilities.
arXiv Detail & Related papers (2025-06-27T09:53:57Z)
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models [103.03315678501546]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z)
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation [33.008513399946914]
OThink-R1 is a method that prunes redundant reasoning steps while preserving logical validity.<n> Experiments across mathematical and question-answering tasks demonstrate that OThink-R1 reduces reasoning redundancy by almost 23% on average.
arXiv Detail & Related papers (2025-06-03T03:31:30Z)
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
Large language models (LLMs) benefit from increased test-time compute, a phenomenon known as test-time scaling.<n>However, reasoning-optimized models often overthink even simple problems, producing excessively verbose outputs and leading to low token efficiency.<n>We identify two key causes of this verbosity: (1) reinforcement learning reduces the information density of forward reasoning, and (2) backward chain-of thought training encourages redundant and often unnecessary verification steps.
arXiv Detail & Related papers (2025-05-28T06:24:45Z)
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL [19.731871225975926]
Large reasoning models (LRMs) are proficient at generating explicit, step-by-step reasoning sequences before producing final answers.<n>To address this over-thinking problem, we explore how to equip LRMs with adaptive thinking capabilities.<n>We propose AutoThink, a multi-stage reinforcement learning framework that progressively optimize reasoning policies.
arXiv Detail & Related papers (2025-05-16T04:01:57Z)
Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>It produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z)
Reasoning Models Can Be Effective Without Thinking [45.411955744222524]
We find that bypassing the thinking process via simple prompting, denoted as NoThinking, can be surprisingly effective.<n>Our method outperforms a range of baselines with similar latency using Thinking, and is comparable to Thinking with significantly longer latency (up to 9x)
arXiv Detail & Related papers (2025-04-14T04:08:16Z)
Effectively Controlling Reasoning Models through Thinking Intervention [41.38412282063417]
Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers.<n>We propose Thinking Intervention, a novel paradigm designed to explicitly guide the internal reasoning processes of LLMs.
arXiv Detail & Related papers (2025-03-31T17:50:13Z)
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [96.27754404942364]
Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited.<n>This paper introduces and analyzes overthinking in LRMs.<n>We observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement.
arXiv Detail & Related papers (2025-02-12T09:23:26Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.<n>We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.