Related papers: The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

URL: http://arxiv.org/abs/2502.08235v1
Date: Wed, 12 Feb 2025 09:23:26 GMT
Title: The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Authors: Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez,
Abstract summary: Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited.<n>This paper introduces and analyzes overthinking in LRMs.<n>We observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement.
Score: 96.27754404942364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement. We propose a framework to study these behaviors, which correlates with human expert assessments, and analyze 4018 trajectories. We observe that higher overthinking scores correlate with decreased performance, with reasoning models exhibiting stronger tendencies toward overthinking compared to non-reasoning models. Our analysis reveals that simple efforts to mitigate overthinking in agentic environments, such as selecting the solution with the lower overthinking score, can improve model performance by almost 30% while reducing computational costs by 43%. These results suggest that mitigating overthinking has strong practical implications. We suggest that by leveraging native function-calling capabilities and selective reinforcement learning overthinking tendencies could be mitigated. We also open-source our evaluation framework and dataset to facilitate research in this direction at https://github.com/AlexCuadron/Overthinking.

Related papers

CODE: A Contradiction-Based Deliberation Extension Framework for Overthinking Attacks on Retrieval-Augmented Generation [43.85448261466922]
We propose an end-to-end attack framework named Contradiction-Based Deliberation Extension (CODE)<n> CODE develops a multi-agent architecture to construct poisoning samples that are injected into the knowledge base.<n>Experiments show that CODE causes a 5.32x-24.72x increase in reasoning token consumption, without degrading task performance.
arXiv Detail & Related papers (2026-01-19T14:52:31Z)
Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models [12.618562275265704]
Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking.<n>We propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition.
arXiv Detail & Related papers (2025-07-03T14:24:26Z)
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z)
When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning [19.329523111916682]
Large reasoning models (LRMs) have significantly advanced performance on complex tasks, yet their tendency to overthink introduces inefficiencies.<n>This study investigates the internal mechanisms of reinforcement learning (RL)-trained LRMs when prompted to save thinking.
arXiv Detail & Related papers (2025-05-21T08:55:35Z)
Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z)
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? [27.374491920521745]
We find that the response length of reasoning LLMs drastically increases for ill-posed questions with missing premises (MiP) This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking. Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries.
arXiv Detail & Related papers (2025-04-09T01:25:27Z)
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities [101.77467538102924]
Recent advancements in Large Reasoning Models (LRMs) have demonstrated remarkable performance in specialized reasoning tasks. We show that acquiring deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs. We demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks.
arXiv Detail & Related papers (2025-03-23T08:18:51Z)
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs [86.79757571440082]
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks.<n>We identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts.<n>We propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts.
arXiv Detail & Related papers (2025-01-30T18:58:18Z)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.<n>This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.<n>We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models [19.466985579720507]
Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but the associated expensive API cost greatly limits the real application. We propose "Synergy of Thoughts"(SoT) to unleash the synergistic potential of hybrid LLMs with different scales for efficient reasoning. We show that SoT substantially reduces the API cost by 38.3%-75.1%, and simultaneously achieves state-of-the-art reasoning accuracy and solution diversity.
arXiv Detail & Related papers (2024-02-04T16:45:01Z)
Reflexion: Language Agents with Verbal Reinforcement Learning [44.85337947858337]
Reflexion is a novel framework to reinforce language agents not by updating weights, but through linguistic feedback. It is flexible enough to incorporate various types (scalar values or free-form language) and sources (external or internally simulated) of feedback signals. For example, Reflexion achieves a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4 that achieves 80%.
arXiv Detail & Related papers (2023-03-20T18:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.