Related papers: Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

URL: http://arxiv.org/abs/2508.11582v1
Date: Fri, 15 Aug 2025 16:40:29 GMT
Title: Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
Authors: Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, Wanxiang Che,
Abstract summary: We introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF)<n>DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism.<n>Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy.
Score: 38.225442399592936
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advancements in large language models (LLMs) have greatly improved their capabilities on complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables models to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16% accuracy improvement.

Related papers

Training Large Reasoning Models Efficiently via Progressive Thought Encoding [63.254758972725654]
Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency.<n>We introduce Progressive Thought, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches.
arXiv Detail & Related papers (2026-02-18T20:03:38Z)
EntroCut: Entropy-Guided Adaptive Truncation for Efficient Chain-of-Thought Reasoning in Small-scale Large Reasoning Models [42.49934375597466]
Large Reasoning Models (LRMs) excel at complex reasoning tasks through extended chain-of-thought generation.<n>We find that the entropy of the model's output distribution in early reasoning steps reliably distinguishes correct from incorrect reasoning.<n>We propose EntroCut, a training-free method that dynamically truncates reasoning by identifying high-confidence states.
arXiv Detail & Related papers (2026-01-30T06:19:16Z)
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering [62.63376387138257]
We propose a plug-and-play intervention framework that adaptively steers large language models (LLMs) reasoning in activation space.<n>RISER constructs a library of reusable reasoning vectors and employs a lightweight Router to dynamically compose them for each input.<n>The Router is optimized via reinforcement learning under task-level rewards, activating latent cognitive primitives in an emergent and compositional manner.
arXiv Detail & Related papers (2026-01-14T08:04:33Z)
WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking [60.35109192765302]
Information seeking is a core capability that enables autonomous reasoning and decision-making.<n>We propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories.<n>Our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.
arXiv Detail & Related papers (2025-10-28T17:51:42Z)
ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models [0.0]
Reasoning Suppression (ARS) is a training-free approach that dynamically suppresses redundant reasoning steps.<n>ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.
arXiv Detail & Related papers (2025-09-29T20:19:41Z)
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking [50.97239453902612]
Large Reasoning Models (LRMs) have achieved impressive performance on challenging tasks, yet their deep reasoning often incurs substantial computational costs.<n>Inspired by Evidence Accumulation Models, we find that LRMs have accumulated sufficient information early in reasoning, making further reasoning steps redundant.<n>We propose Just-Enough Thinking (JET), which trains models to proactively terminate unnecessary reasoning.
arXiv Detail & Related papers (2025-09-27T16:25:06Z)
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control [18.273777938294327]
Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts.<n>We introduce AALC, a lightweight, accuracy-aware length reward integrated into reinforcement learning.<n>We show that our approach reduces response length by over 50% while maintaining or even improving the original accuracy.
arXiv Detail & Related papers (2025-06-25T06:29:18Z)
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning [95.28059121743831]
Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks.<n>We introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation.<n>SwS enables robust generalization byempowering the model to self-identify and address its weaknesses in RL, yielding average performance gains of 10.0% and 7.7% on 7B and 32B models.
arXiv Detail & Related papers (2025-06-10T17:02:00Z)
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [68.96619605651155]
Large reasoning models (LRMs) may drastically increase the output length due to overthinking.<n>We propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns.<n>Our method achieves up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
arXiv Detail & Related papers (2025-05-27T20:59:29Z)
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training [63.99981166397423]
Recent large language models (LLMs) exhibit impressive reasoning but often over-think, generating excessively long responses that hinder efficiency.<n>We introduce DIET, a framework that systematically cuts these "token calories" by integrating on-the-fly problem difficulty into the reinforcement learning process.<n> DIET dynamically adapts token compression strategies by modulating token penalty strength and conditioning target lengths on estimated task difficulty.
arXiv Detail & Related papers (2025-05-25T16:24:12Z)
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens [51.90059610606049]
This paper revisits the efficiency of such reasoning processes through an information-theoretic lens.<n>We propose two metrics, InfoBias and InfoGain, to quantify divergence from ideal reasoning paths and stepwise information contribution.<n>Motivated by these findings, we introduce an entropy-based Adaptive Think strategy that dynamically halts reasoning once confidence is sufficiently high.
arXiv Detail & Related papers (2025-05-23T13:38:56Z)
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models [2.9828816765661363]
We propose Distilled Reasoning Pruning ( traces), a hybrid framework that combines inference-time pruning with tuning-based distillation.<n>We find that models trained with traces achieve substantial improvements in token efficiency without sacrificing accuracy.<n>Further analysis shows that aligning the reasoning structure of training CoTs with the student's reasoning capacity is critical for effective knowledge transfer and performance gains.
arXiv Detail & Related papers (2025-05-20T06:15:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.