Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
- URL: http://arxiv.org/abs/2510.02249v1
- Date: Thu, 02 Oct 2025 17:36:50 GMT
- Title: Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
- Authors: Tianyi Jiang, Yi Bin, Yujuan Ding, Kainian Zhu, Fei Ma, Jingkuan Song, Heng Tao Shen,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning.<n>They often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems.<n>We introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process.
- Score: 82.62935304152239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning. However, they often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems. This issue may degrade the efficiency of the models and make them difficult to adapt the reasoning depth to the complexity of problems. To address this, we introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process. We further propose a novel reasoning paradigm -- Explore Briefly, Then Decide -- with an associated Cumulative Entropy Regulation (CER) mechanism. This paradigm leverages TECA to help the model dynamically determine the optimal point to conclude its thought process and provide a final answer, thus achieving efficient reasoning. Experimental results across diverse mathematical benchmarks show that our approach substantially mitigates overthinking without sacrificing problem-solving ability. With our thinking paradigm, the average response length decreases by up to 71% on simpler datasets, demonstrating the effectiveness of our method in creating a more efficient and adaptive reasoning process.
Related papers
- MixReasoning: Switching Modes to Think [79.70845484191543]
Reasoning models enhance performance by tackling problems in a step-by-step manner.<n>Applying extended reasoning to every step introduces substantial redundancy.<n>We propose MixReasoning, a framework that dynamically adjusts the depth of reasoning within a single response.
arXiv Detail & Related papers (2025-10-07T15:46:34Z) - Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models [28.578488403845146]
Large Audio Language Models (LALMs) have shown remarkable reasoning capabilities.<n>We propose a difficulty-adaptive reasoning method for LALMs.
arXiv Detail & Related papers (2025-09-26T06:49:14Z) - Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey [8.736170026262279]
Large reasoning models (LRMs) like OpenAI o1 and DeepSeek R1 have demonstrated impressive performance on complex reasoning tasks.<n>These models also face a huge challenge that generating unnecessarily lengthy and redundant reasoning chains.
arXiv Detail & Related papers (2025-07-13T14:51:59Z) - PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z) - Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z) - Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z) - Think When You Need: Self-Adaptive Chain-of-Thought Learning [20.22448368125018]
Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems.<n>We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity.<n>Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness.
arXiv Detail & Related papers (2025-04-04T07:34:01Z) - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs [86.79757571440082]
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks.<n>We identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts.<n>We propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts.
arXiv Detail & Related papers (2025-01-30T18:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.