How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
- URL: http://arxiv.org/abs/2506.10979v1
- Date: Thu, 12 Jun 2025 17:59:53 GMT
- Title: How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
- Authors: Sohee Yang, Sang-Woo Lee, Nora Kassner, Daniela Gottesman, Sebastian Riedel, Mor Geva,
- Abstract summary: We investigate how well reasoning models identify and recover from four types of unhelpful thoughts.<n>We show that models are effective at identifying most unhelpful thoughts but struggle to recover from the same thoughts when these are injected into their thinking process.
- Score: 31.755709525282914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent reasoning models show the ability to reflect, backtrack, and self-validate their reasoning, which is crucial in spotting mistakes and arriving at accurate solutions. A natural question that arises is how effectively models can perform such self-reevaluation. We tackle this question by investigating how well reasoning models identify and recover from four types of unhelpful thoughts: uninformative rambling thoughts, thoughts irrelevant to the question, thoughts misdirecting the question as a slightly different question, and thoughts that lead to incorrect answers. We show that models are effective at identifying most unhelpful thoughts but struggle to recover from the same thoughts when these are injected into their thinking process, causing significant performance drops. Models tend to naively continue the line of reasoning of the injected irrelevant thoughts, which showcases that their self-reevaluation abilities are far from a general "meta-cognitive" awareness. Moreover, we observe non/inverse-scaling trends, where larger models struggle more than smaller ones to recover from short irrelevant thoughts, even when instructed to reevaluate their reasoning. We demonstrate the implications of these findings with a jailbreak experiment using irrelevant thought injection, showing that the smallest models are the least distracted by harmful-response-triggering thoughts. Overall, our findings call for improvement in self-reevaluation of reasoning models to develop better reasoning and safer systems.
Related papers
- Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know? [7.423494663010787]
Reasoning language models have set state-of-the-art (SOTA) records on many challenging benchmarks.<n>Like previous language models, reasoning models are prone to generating confident, plausible responses that are incorrect.<n>Knowing when and how much to trust these models is critical to the safe deployment of reasoning models in real-world applications.
arXiv Detail & Related papers (2025-06-22T21:46:42Z) - Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models [103.03315678501546]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z) - Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z) - Internal Bias in Reasoning Models leads to Overthinking [58.817405319722596]
We show for the first time that overthinking in reasoning models may stem from their internal bias towards input texts.<n>By masking out the original input section, the affect of internal bias can be effectively alleviated and the reasoning length could be reduced by 31%-53%.
arXiv Detail & Related papers (2025-05-22T09:35:52Z) - Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z) - Thinking Out Loud: Do Reasoning Models Know When They're Right? [19.776645881640178]
Large reasoning models (LRMs) have recently demonstrated impressive capabilities in complex reasoning tasks.<n>We investigate how LRMs interact with other model behaviors by analyzing verbalized confidence.<n>We find that reasoning models may possess a diminished awareness of their own knowledge boundaries.
arXiv Detail & Related papers (2025-04-09T03:58:19Z) - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [96.27754404942364]
Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited.<n>This paper introduces and analyzes overthinking in LRMs.<n>We observe three recurring patterns: Analysis Paralysis, Rogue Actions, and Premature Disengagement.
arXiv Detail & Related papers (2025-02-12T09:23:26Z) - Contrastive Chain-of-Thought Prompting [74.10511560147293]
We propose contrastive chain of thought to enhance language model reasoning.
Compared to the conventional chain of thought, our approach provides both valid and invalid reasoning demonstrations.
Our experiments on reasoning benchmarks demonstrate that contrastive chain of thought can serve as a general enhancement of chain-of-thought prompting.
arXiv Detail & Related papers (2023-11-15T18:54:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.