SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
- URL: http://arxiv.org/abs/2510.19767v1
- Date: Wed, 22 Oct 2025 16:56:01 GMT
- Title: SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
- Authors: Xichen Zhang, Sitong Wu, Haoru Tan, Shaozuo Yu, Yinghao Zhu, Ziyi He, Jiaya Jia,
- Abstract summary: Long chain-of-thought (LongCoT) is central to the recent breakthroughs achieved by large language models in complex reasoning tasks.<n>We propose a simple yet effective reasoning strategy: the SmartSwitch inference framework.<n>This framework can be easily integrated into any large language model as a plug-and-play solution.
- Score: 49.290631188365786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The long chain-of-thought (LongCoT) capability is central to the recent breakthroughs achieved by large language models in complex reasoning tasks. However, the accompanying issue of ''underthinking'', where models exhibit shallow reasoning by frequently switching thoughts without sufficient exploration, limits both performance and token efficiency. To address this problem, we propose a simple yet effective reasoning strategy: the SmartSwitch inference framework. This framework can be easily integrated into any large language model as a plug-and-play solution, continuously monitoring the model's reasoning process to detect underthinking and guide it toward deeper exploration of promising but overlooked thoughts. Specifically, the perception module identifies points where thoughts switch and evaluates the potential of the preceding thought using an off-the-shelf process reward model (PRM). If a high-potential thought is found to be prematurely abandoned, the intervention module interrupts the ongoing inference, backtracks to the point before the switch, and inserts a "deepening prompt" to encourage further exploration along that promising path. Extensive experiments on challenging mathematical reasoning benchmarks demonstrate that our method significantly enhances the performance of various large language models of different sizes.
Related papers
- Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks [54.31998314008198]
Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks.<n>We attribute this limitation to textbfreasoning overconfidence: a tendency to express undue certainty in an incomplete solution set.<n>We propose the textbfcognitive-rigidity hypothesis, which posits that overconfidence arises when the reasoning process prematurely converges on a narrow set of thought paths.
arXiv Detail & Related papers (2025-12-01T14:35:06Z) - Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation [82.62935304152239]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning.<n>They often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems.<n>We introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process.
arXiv Detail & Related papers (2025-10-02T17:36:50Z) - Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts [16.941385792353493]
Large Language Models (LLMs) excel at problem solving by generating chain of thoughts in natural language.<n>Recent work proposes a latent thinking architecture Huginn-3.5B, which represents intermediate reasoning steps as sequence of latent representations.<n>We show how Huginn-3.5B thinks in the latent space and how external supervision signals can improve its latent thinking processes.
arXiv Detail & Related papers (2025-09-30T14:26:36Z) - Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models [28.756240721942138]
Reasoning large language models (RLLMs) have recently demonstrated remarkable capabilities through structured and multi-step reasoning.<n>We propose Thinking with Nothinking (JointThinking), a new ICL paradigm that prompts the model to generate two answers in parallel.<n>JointThinking significantly outperforms few-shot chain-of-thought (CoT), thinking twice and majority voting.
arXiv Detail & Related papers (2025-08-05T12:09:55Z) - Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models [130.5487886246353]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z) - Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z) - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs [86.79757571440082]
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks.<n>We identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts.<n>We propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts.
arXiv Detail & Related papers (2025-01-30T18:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.