Related papers: Spherical Steering: Geometry-Aware Activation Rotation for Language Models

Spherical Steering: Geometry-Aware Activation Rotation for Language Models

URL: http://arxiv.org/abs/2602.08169v1
Date: Mon, 09 Feb 2026 00:15:47 GMT
Title: Spherical Steering: Geometry-Aware Activation Rotation for Language Models
Authors: Zejia You, Chunyuan Deng, Hanjie Chen,
Abstract summary: Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining.<n>In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation.<n>Our method rotates activations along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal.
Score: 15.078810641141295
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining. However, standard approaches typically rely on activation addition, a geometric operation that inevitably alters the magnitude of hidden representations. This raises concerns about representation collapse and degradation of open-ended generation capabilities. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control.

Related papers

Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training [1.0518862318418603]
In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely.<n>We investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model.
arXiv Detail & Related papers (2026-02-09T07:14:28Z)
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection [1.7802147489386628]
Large language models (LLMs) remain vulnerable to adversarial attacks that elicit harmful behaviors.<n>We propose Selective Steering, which addresses these limitations through two key innovations.<n> Experiments across nine models demonstrate that Selective Steering achieves 5.5x higher attack success rates than prior methods.
arXiv Detail & Related papers (2026-01-27T08:56:25Z)
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z)
Rotation-Robust Regression with Convolutional Model Trees [11.143798306106362]
We study rotation-robust learning for image inputs using Convolutional Model Trees (CMTs)<n>We introduce three geometry-aware inductive biases for split directions and quantify their impact on robustness under in-plane rotations.<n>We observe consistent trends on MNIST digit recognition implemented as one-vs-rest regression.
arXiv Detail & Related papers (2026-01-08T12:53:33Z)
Deep Delta Learning [91.75868893250662]
We introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection.<n>We provide a spectral analysis of this operator, demonstrating that the gate $(mathbfX)$ enables dynamic between identity mapping, projection, and geometric reflection.<n>This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics.
arXiv Detail & Related papers (2026-01-01T18:11:38Z)
Angular Steering: Behavior Control via Rotation in Activation Space [1.3400719989424488]
Angular Steering is a novel and flexible method for behavior modulation.<n>It operates by rotating activations within a fixed two-dimensional subspace.<n>It provides continuous, fine-grained control over behaviors such as refusal and compliance.
arXiv Detail & Related papers (2025-10-30T08:23:35Z)
ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z)
Equivariant Goal Conditioned Contrastive Reinforcement Learning [5.019456977535218]
Contrastive Reinforcement Learning (CRL) provides a promising framework for extracting useful structured representations from unlabeled interactions.<n>We propose Equivariant CRL, which further structures the latent space using equivariant constraints.<n>Our approach consistently outperforms strong baselines across a range of simulated tasks in both state-based and image-based settings.
arXiv Detail & Related papers (2025-07-22T01:13:45Z)
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.