Mitigating Overthinking in Large Reasoning Models via Manifold Steering
- URL: http://arxiv.org/abs/2505.22411v1
- Date: Wed, 28 May 2025 14:39:26 GMT
- Title: Mitigating Overthinking in Large Reasoning Models via Manifold Steering
- Authors: Yao Huang, Huanran Chen, Shouwei Ruan, Yichi Zhang, Xingxing Wei, Yinpeng Dong,
- Abstract summary: Large Reasoning Models (LRMs) exhibit a phenomenon known as overthinking during inference.<n>We propose Manifold Steering, a novel approach that elegantly projects the steering direction onto the low-dimensional activation manifold.<n>Our method reduces output tokens by up to 71% while maintaining and even improving the accuracy on several mathematical benchmarks.
- Score: 32.666911833023526
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex tasks such as mathematics and coding. However, these models frequently exhibit a phenomenon known as overthinking during inference, characterized by excessive validation loops and redundant deliberation, leading to substantial computational overheads. In this paper, we aim to mitigate overthinking by investigating the underlying mechanisms from the perspective of mechanistic interpretability. We first showcase that the tendency of overthinking can be effectively captured by a single direction in the model's activation space and the issue can be eased by intervening the activations along this direction. However, this efficacy soon reaches a plateau and even deteriorates as the intervention strength increases. We therefore systematically explore the activation space and find that the overthinking phenomenon is actually tied to a low-dimensional manifold, which indicates that the limited effect stems from the noises introduced by the high-dimensional steering direction. Based on this insight, we propose Manifold Steering, a novel approach that elegantly projects the steering direction onto the low-dimensional activation manifold given the theoretical approximation of the interference noise. Extensive experiments on DeepSeek-R1 distilled models validate that our method reduces output tokens by up to 71% while maintaining and even improving the accuracy on several mathematical benchmarks. Our method also exhibits robust cross-domain transferability, delivering consistent token reduction performance in code generation and knowledge-based QA tasks. Code is available at: https://github.com/Aries-iai/Manifold_Steering.
Related papers
- Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z) - Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
Large reasoning models (LRMs) have significantly enhanced their reasoning capabilities by generating longer chains of thought.<n>This performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process.<n>We propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process.
arXiv Detail & Related papers (2025-05-20T16:53:40Z) - SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z) - Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs [77.66717051042032]
Practitioners have consistently observed three puzzling phenomena in transformer-based large language models.
These phenomena are characterized by certain so-called "sink tokens" receiving disproportionately high attention weights.
We elucidate the mechanisms behind extreme-token phenomena.
arXiv Detail & Related papers (2024-10-17T17:54:06Z) - FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection [19.308304984645684]
We introduce a novel model that can generate diverse grasps for a multi-fingered hand.<n>The proposed idea gains superior performance and higher run-time efficiency against strong baselines.<n>We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.
arXiv Detail & Related papers (2024-07-21T13:33:08Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.