On the Self-awareness of Large Reasoning Models' Capability Boundaries
- URL: http://arxiv.org/abs/2509.24711v2
- Date: Sun, 05 Oct 2025 08:47:48 GMT
- Title: On the Self-awareness of Large Reasoning Models' Capability Boundaries
- Authors: Qingjie Zhang, Yujia Fu, Yang Wang, Liu Yan, Tao Wei, Ke Xu, Minlie Huang, Han Qiu,
- Abstract summary: We investigate whether Large Reasoning Models (LRMs) possess self-awareness of capability boundaries.<n>For black-box models, reasoning expressions reveal boundary signals, with accelerated growing confidence trajectory for solvable problems but convergent uncertainty trajectory for unsolvable ones.<n>For white-box models, we show that hidden states of the last input token encode boundary information, with solvable and unsolvable problems linearly separable even before reasoning begins.
- Score: 46.74014595035246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Reasoning Models (LRMs) have shown impressive performance on complex reasoning tasks such as mathematics, yet they also display misbehaviors that expose their limitations. In particular, when faced with hard questions, LRMs often engage in unproductive reasoning until context limit, producing wrong answers while wasting substantial computation. This phenomenon reflects a fundamental issue: current answering paradigms overlook the relationship between questions and LRMs' capability boundaries. In this paper, we investigate whether LRMs possess self-awareness of capability boundaries. We begin by an observation that LRMs may know what they cannot solve through expressed reasoning confidence. For black-box models, we find that reasoning expressions reveal boundary signals, with accelerated growing confidence trajectory for solvable problems but convergent uncertainty trajectory for unsolvable ones. For white-box models, we show that hidden states of the last input token encode boundary information, with solvable and unsolvable problems linearly separable even before reasoning begins. Building on these findings, we propose two simple yet effective optimization strategies: reasoning expression monitoring and hidden states monitoring. Experiments demonstrate that these boundary-aware strategies enable LRMs to avoid unproductive reasoning without sacrificing accuracy, significantly improving reliability and efficiency by cutting token usage up to 62.7 - 93.6%.
Related papers
- Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense [16.519353449118814]
We analyze a critical vulnerability we term reasoning distraction, where LRMs are diverted from their primary objective by irrelevant yet complex tasks maliciously embedded in the prompt.<n>We show that even state-of-the-art LRMs are highly susceptible, with injected distractors reducing task accuracy by up to 60%.<n>We propose a training-based defense that combines Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on synthetic adversarial data, improving robustness by over 50 points on challenging distractor attacks.
arXiv Detail & Related papers (2025-10-17T23:16:34Z) - One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
arXiv Detail & Related papers (2025-10-12T07:42:57Z) - From Long to Short: LLMs Excel at Trimming Own Reasoning Chains [48.692414597960244]
O1/R1 style large reasoning models (LRMs) signal a substantial leap forward over conventional instruction-following LLMs.<n>Recent studies show that LRMs are prone to suffer from overthinking.<n>We propose a test-time scaling method, EDIT, which efficiently guides LRMs to identify the shortest correct reasoning paths at test time.
arXiv Detail & Related papers (2025-09-07T19:00:44Z) - Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models [36.56061020865792]
Large reasoning models (LRMs) have shown remarkable progress on complex reasoning tasks.<n>We find that LRMs continually fail to provide appropriate abstentions when confronted with unanswerable questions.<n>We propose a lightweight, two-stage method that combines cognitive monitoring with inference-time intervention.
arXiv Detail & Related papers (2025-08-26T07:37:56Z) - Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models [12.618562275265704]
Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking.<n>We propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition.
arXiv Detail & Related papers (2025-07-03T14:24:26Z) - Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs [8.359909829007005]
We investigate whether efficient reasoning strategies introduce behavioral inconsistencies in large reasoning models (LRMs)<n>$ICBENCH$ is a benchmark designed to measure inconsistency in LRMs across three dimensions.<n>We find that while larger models generally exhibit greater consistency than smaller ones, they all display widespread "scheming" behaviors.
arXiv Detail & Related papers (2025-06-24T10:25:28Z) - On Reasoning Strength Planning in Large Reasoning Models [50.61816666920207]
We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation.<n>We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model.<n>Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors.
arXiv Detail & Related papers (2025-06-10T02:55:13Z) - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [16.266145641151375]
Large Reasoning Models generate detailed thinking processes before providing answers.<n>We show that LRMs face a complete accuracy collapse beyond certain complexities.<n>We also investigate the reasoning traces in more depth, studying the patterns of explored solutions.
arXiv Detail & Related papers (2025-06-07T22:42:29Z) - Let LRMs Break Free from Overthinking via Self-Braking Tuning [68.93713497579853]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z) - BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs [87.24843751412783]
We propose BARREL, a framework that promotes concise and boundary-aware factual reasoning.<n>Our experiments show that BARREL-training increases the reliability of DeepSeek-R1-Distill-Llama-8B from 39.33% to 61.48%.
arXiv Detail & Related papers (2025-05-18T07:27:34Z) - Efficient Inference for Large Reasoning Models: A Survey [74.17203483365171]
Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason.<n>However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time.<n>This survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality.
arXiv Detail & Related papers (2025-03-29T13:27:46Z) - To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models [56.19026073319406]
Large Reasoning Models (LRMs) are designed to solve complex tasks by generating explicit reasoning traces before producing final answers.<n>We reveal a critical vulnerability in LRMs -- termed Unthinking -- wherein the thinking process can be bypassed by manipulating special tokens.<n>In this paper, we investigate this vulnerability from both malicious and beneficial perspectives.
arXiv Detail & Related papers (2025-02-16T10:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.