Related papers: Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking

Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking

URL: http://arxiv.org/abs/2510.07880v2
Date: Fri, 10 Oct 2025 21:36:08 GMT
Title: Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking
Authors: Xinliang Frederick Zhang, Anhad Mohananey, Alexandra Chronopoulou, Pinelopi Papalampidi, Somit Gupta, Tsendsuren Munkhdalai, Lu Wang, Shyam Upadhyay,
Abstract summary: Long chain-of-thought (CoT) models often engage in unnecessarily extensive reasoning even for simple queries.<n>This study introduces a systematic, fine-grained analyzer of LLMs' thought process to bridge the gap, TRACE.<n>We propose a utility-based definition of overthinking, which moves beyond length-based metrics.
Score: 46.43570276604168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Models employing long chain-of-thought (CoT) reasoning have shown superior performance on complex reasoning tasks. Yet, this capability introduces a critical and often overlooked inefficiency -- overthinking -- models often engage in unnecessarily extensive reasoning even for simple queries, incurring significant computations without accuracy improvements. While prior work has explored solutions to mitigate overthinking, a fundamental gap remains in our understanding of its underlying causes. Most existing analyses are limited to superficial, profiling-based observations, failing to delve into LLMs' inner workings. This study introduces a systematic, fine-grained analyzer of LLMs' thought process to bridge the gap, TRACE. We first benchmark the overthinking issue, confirming that long-thinking models are five to twenty times slower on simple tasks with no substantial gains. We then use TRACE to first decompose the thought process into minimally complete sub-thoughts. Next, by inferring discourse relationships among sub-thoughts, we construct granular thought progression graphs and subsequently identify common thinking patterns for topically similar queries. Our analysis reveals two major patterns for open-weight thinking models -- Explorer and Late Landing. This finding provides evidence that over-verification and over-exploration are the primary drivers of overthinking in LLMs. Grounded in thought structures, we propose a utility-based definition of overthinking, which moves beyond length-based metrics. This revised definition offers a more insightful understanding of LLMs' thought progression, as well as practical guidelines for principled overthinking management.

Related papers

SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration [49.290631188365786]
Long chain-of-thought (LongCoT) is central to the recent breakthroughs achieved by large language models in complex reasoning tasks.<n>We propose a simple yet effective reasoning strategy: the SmartSwitch inference framework.<n>This framework can be easily integrated into any large language model as a plug-and-play solution.
arXiv Detail & Related papers (2025-10-22T16:56:01Z)
Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit [114.83867400179354]
Overthinking can degrade overall performance of large language models.<n>We categorize reasoning into three stages: insufficient exploration stage, compensatory reasoning stage, and reasoning convergence stage.<n>We develop a lightweight thresholding strategy based on rules to improve reasoning accuracy.
arXiv Detail & Related papers (2025-08-25T03:17:17Z)
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs [61.90251858867122]
Thinking LLMs solve complex tasks at the expense of increased compute and overthinking on simpler problems.<n>Non-thinking LLMs are faster and cheaper but underthink on harder reasoning problems.<n>We introduce OptimalThinkingBench, a unified benchmark that jointly evaluates overthinking and underthinking in LLMs.
arXiv Detail & Related papers (2025-08-18T17:53:10Z)
Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models [22.57102686737925]
Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking.<n>We propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition.
arXiv Detail & Related papers (2025-07-03T14:24:26Z)
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z)
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? [27.374491920521745]
We find that the response length of reasoning LLMs drastically increases for ill-posed questions with missing premises (MiP)<n>This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking.<n>Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries.
arXiv Detail & Related papers (2025-04-09T01:25:27Z)
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs [86.79757571440082]
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks.<n>We identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts.<n>We propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts.
arXiv Detail & Related papers (2025-01-30T18:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.