Related papers: Controllable LLM Reasoning via Sparse Autoencoder-Based Steering

Controllable LLM Reasoning via Sparse Autoencoder-Based Steering

URL: http://arxiv.org/abs/2601.03595v1
Date: Wed, 07 Jan 2026 05:26:26 GMT
Title: Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
Authors: Yi Fang, Wenjie Wang, Mingfeng Xue, Boyi Deng, Fengli Xu, Dayiheng Liu, Fuli Feng,
Abstract summary: Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies.<n>Currently, reasoning strategies are autonomously selected by LRMs themselves.<n>Existing methods struggle to control fine-grained reasoning strategies due to conceptual entanglement in LRMs' hidden states.
Score: 66.36947132041657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies (e.g. backtracking, cross-verification) during reasoning process, which improves their performance on complex tasks. Currently, reasoning strategies are autonomously selected by LRMs themselves. However, such autonomous selection often produces inefficient or even erroneous reasoning paths. To make reasoning more reliable and flexible, it is important to develop methods for controlling reasoning strategies. Existing methods struggle to control fine-grained reasoning strategies due to conceptual entanglement in LRMs' hidden states. To address this, we leverage Sparse Autoencoders (SAEs) to decompose strategy-entangled hidden states into a disentangled feature space. To identify the few strategy-specific features from the vast pool of SAE features, we propose SAE-Steering, an efficient two-stage feature identification pipeline. SAE-Steering first recalls features that amplify the logits of strategy-specific keywords, filtering out over 99\% of features, and then ranks the remaining features by their control effectiveness. Using the identified strategy-specific features as control vectors, SAE-Steering outperforms existing methods by over 15\% in control effectiveness. Furthermore, controlling reasoning strategies can redirect LRMs from erroneous paths to correct ones, achieving a 7\% absolute accuracy improvement.

Related papers

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance [86.46794021499511]
We show a previously underexplored gap between strategy usage and strategy executability.<n>We propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability.<n> SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance.
arXiv Detail & Related papers (2026-02-26T03:34:23Z)
Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features [1.5874067490843806]
Control Reinforcement Learning trains a policy to select SAE features for steering at each token, producing interpretable intervention logs.<n> Adaptive Feature Masking encourages diverse feature discovery while preserving singlefeature interpretability.<n>On Gemma 2 2B across MMLU, BBQ, GSM8K, HarmBench, and XSTest, CRL achieves improvements while providing per-token intervention logs.
arXiv Detail & Related papers (2026-02-11T02:28:49Z)
AI Agent Systems for Supply Chains: Structured Decision Prompts and Memory Retrieval [3.3703751888858675]
This study investigates large language model (LLM) -based multi-agent systems (MASs) as a promising approach to inventory management.<n>It is unclear whether LLM-based MASs can consistently derive optimal ordering policies and adapt to diverse supply chain scenarios.
arXiv Detail & Related papers (2026-02-05T10:35:00Z)
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering [62.63376387138257]
We propose a plug-and-play intervention framework that adaptively steers large language models (LLMs) reasoning in activation space.<n>RISER constructs a library of reusable reasoning vectors and employs a lightweight Router to dynamically compose them for each input.<n>The Router is optimized via reinforcement learning under task-level rewards, activating latent cognitive primitives in an emergent and compositional manner.
arXiv Detail & Related papers (2026-01-14T08:04:33Z)
Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models [39.03483371038282]
CogER is a framework inspired by human hierarchical reasoning.<n>For queries requiring external tools, we introduce Cognitive Tool-Assisted Reasoning.<n>CogER outperforms state-of-the-art Test-Time scaling methods.
arXiv Detail & Related papers (2025-12-17T05:11:58Z)
ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization [8.765548346606218]
Large Reasoning Models (LRMs) are powerful, but they still suffer from inefficient and off-target reasoning.<n>In this paper, we introduce ThinkPilot, a training-free framework that automatically optimize LRMs reasoning.<n>It uses an evolutionary process to generate think-es, which are instructions that evolve driven by a taxonomy of reasoning behaviors.
arXiv Detail & Related papers (2025-10-14T02:02:19Z)
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
arXiv Detail & Related papers (2025-10-12T07:42:57Z)
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs [49.995906301946]
Existing methods usually leverage a fixed strategy to guide Large Language Models (LLMs) to perform mathematical reasoning.<n>Our analysis reveals that the single strategy cannot adapt to problem-specific requirements and thus overlooks the trade-off between effectiveness and efficiency.<n>We propose Planning and Routing through Instance-Specific Modeling (PRISM), a novel framework that decouples mathematical reasoning into two stages: strategy planning and targeted execution.
arXiv Detail & Related papers (2025-09-29T07:22:41Z)
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection [7.045509749924679]
Route-To-Reason (RTR) is a novel unified routing framework that dynamically allocates both LMs and reasoning strategies according to task difficulty under budget constraints.<n>RTR learns compressed representations of both expert models and reasoning strategies, enabling their joint and adaptive selection at inference time.
arXiv Detail & Related papers (2025-05-26T02:53:17Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>We train the strategic reasoning model via multi-turn reinforcement learning (RL),utilizing process rewards and iterative self-play.<n>Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Zero-Shot Strategies for Length-Controllable Summarization [56.15356055672189]
Large language models (LLMs) struggle with precise length control, particularly in zero-shot settings.<n>We conduct a comprehensive study evaluating LLMs' length control capabilities across multiple measures and propose practical methods to improve controllability.<n>Our experiments with LLaMA 3 reveal stark differences in length adherence across measures and highlight inherent biases of the model.
arXiv Detail & Related papers (2024-12-31T02:53:27Z)
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks [44.45037694899524]
We introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement. Our experiments demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance.
arXiv Detail & Related papers (2024-10-21T15:55:04Z)
CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control [26.21425058462886]
Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. We present the first attempts to solve adaptive RAG from a representation perspective and develop an inherent control-based framework, termed name. Experiments show that name is superior to existing adaptive RAG methods on a diverse set of tasks.
arXiv Detail & Related papers (2024-05-29T03:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.