Related papers: Angular Steering: Behavior Control via Rotation in Activation Space

Angular Steering: Behavior Control via Rotation in Activation Space

URL: http://arxiv.org/abs/2510.26243v1
Date: Thu, 30 Oct 2025 08:23:35 GMT
Title: Angular Steering: Behavior Control via Rotation in Activation Space
Authors: Hieu M. Vu, Tan M. Nguyen,
Abstract summary: Angular Steering is a novel and flexible method for behavior modulation.<n>It operates by rotating activations within a fixed two-dimensional subspace.<n>It provides continuous, fine-grained control over behaviors such as refusal and compliance.
Score: 1.3400719989424488
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Controlling specific behaviors in large language models while preserving their general capabilities is a central challenge for safe and reliable artificial intelligence deployment. Current steering methods, such as vector addition and directional ablation, are constrained within a two-dimensional subspace defined by the activation and feature direction, making them sensitive to chosen parameters and potentially affecting unrelated features due to unintended interactions in activation space. We introduce Angular Steering, a novel and flexible method for behavior modulation that operates by rotating activations within a fixed two-dimensional subspace. By formulating steering as a geometric rotation toward or away from a target behavior direction, Angular Steering provides continuous, fine-grained control over behaviors such as refusal and compliance. We demonstrate this method using refusal steering emotion steering as use cases. Additionally, we propose Adaptive Angular Steering, a selective variant that rotates only activations aligned with the target feature, further enhancing stability and coherence. Angular Steering generalizes existing addition and orthogonalization techniques under a unified geometric rotation framework, simplifying parameter selection and maintaining model stability across a broader range of adjustments. Experiments across multiple model families and sizes show that Angular Steering achieves robust behavioral control while maintaining general language modeling performance, underscoring its flexibility, generalization, and robustness compared to prior approaches. Code and artifacts are available at https://github.com/lone17/angular-steering/.

Related papers

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation [55.467742403416175]
We introduce a physics-driven neural algorithm that translates large-scale motion capture to humanoid embodiments.<n>We learn a unified multimodal controller that supports both dense references and sparse task specifications.<n>Results show that ULTRA generalizes to autonomous, goal-conditioned whole-body loco-manipulation from egocentric perception.
arXiv Detail & Related papers (2026-03-03T18:59:29Z)
Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations [0.0]
I investigate why steering reliability differs across behaviors and how it is impacted by steering vector training data.<n>I find that higher cosine similarity between training activation differences predicts more reliable steering.<n>I observe that behavior datasets where positive and negative activations are better separated along the steering direction are more reliably steerable.
arXiv Detail & Related papers (2026-02-19T22:37:05Z)
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment [49.68063561145927]
We propose a unified ordinary differential equations (ODEs)-based theoretical framework for activation steering.<n>We introduce ODESteer, a kind of ODE-based steering guided by barrier functions.<n>Compared to state-of-the-art activation steering methods, ODESteer achieves consistent empirical improvements.
arXiv Detail & Related papers (2026-02-19T17:13:44Z)
Spherical Steering: Geometry-Aware Activation Rotation for Language Models [15.078810641141295]
Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining.<n>In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation.<n>Our method rotates activations along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal.
arXiv Detail & Related papers (2026-02-09T00:15:47Z)
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics [81.80010043113445]
Local weight fine-tuning, LoRA-based adaptation, and activation-based interventions are studied in isolation.<n>We present a unified view that frames these interventions as dynamic weight updates induced by a control signal.<n>Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility.
arXiv Detail & Related papers (2026-02-02T17:04:36Z)
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection [1.7802147489386628]
Large language models (LLMs) remain vulnerable to adversarial attacks that elicit harmful behaviors.<n>We propose Selective Steering, which addresses these limitations through two key innovations.<n> Experiments across nine models demonstrate that Selective Steering achieves 5.5x higher attack success rates than prior methods.
arXiv Detail & Related papers (2026-01-27T08:56:25Z)
Dynamically Scaled Activation Steering [3.177576903071419]
We introduce Dynamically Scaled Activation Steering (DSAS), a method-agnostic steering framework that decouples when to steer from how to steer.<n>DSAS adaptively modulates the strength of existing steering transformations across layers and inputs, intervening strongly only when undesired behavior is detected.
arXiv Detail & Related papers (2025-12-03T10:50:15Z)
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration [17.225716209866086]
We propose a position-wise activation steering framework for large language models (LLMs) on the web.<n>PIXEL learns a property-aligned subspace from dual views and selects intervention strength via a constrained geometric objective.<n>PIXEL consistently improves attribute alignment while preserving model general capabilities.
arXiv Detail & Related papers (2025-10-11T13:13:34Z)
Activation Steering with a Feedback Controller [4.609594868699996]
Proportional-Integral-Derivative (PID) Steering is a principled framework that leverages the full PID controller for activation steering in large language models.<n>PID Steering consistently outperforms existing approaches, achieving more robust and reliable behavioral control.
arXiv Detail & Related papers (2025-10-05T18:05:28Z)
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z)
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms [71.85633762642125]
The vast number of parameters in models often results in highly intertwined internal representations.<n>Recent research has explored the use of sparse autoencoders (SAE) to disentangle knowledge in high-dimensional spaces for steering.<n>We propose Steering Target Atoms (STA), a novel method that isolates and manipulates disentangled knowledge components to enhance safety.
arXiv Detail & Related papers (2025-05-23T17:59:18Z)
Analyzing the Generalization and Reliability of Steering Vectors [8.253773195379166]
We show that steering vectors have substantial limitations both in- and out-of-distribution.<n>In-distribution, steerability is highly variable across different inputs.<n>Out-of-distribution, while steering vectors often generalise well, for several concepts they are brittle to reasonable changes in the prompt.
arXiv Detail & Related papers (2024-07-17T08:32:03Z)
OmniControl: Control Any Joint at Any Time for Human Motion Generation [46.293854851116215]
We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model. We propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals. At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion.
arXiv Detail & Related papers (2023-10-12T17:59:38Z)
DeepMLS: Geometry-Aware Control Point Deformation [76.51312491336343]
We introduce DeepMLS, a space-based deformation technique, guided by a set of displaced control points. We leverage the power of neural networks to inject the underlying shape geometry into the deformation parameters. Our technique facilitates intuitive piecewise smooth deformations, which are well suited for manufactured objects.
arXiv Detail & Related papers (2022-01-05T23:55:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.