Summary
This theme tracks activation steering as an inference-time method to control and adapt language models without modifying parameters. Representative work spans extracting contrastive instruction vectors, generating steering vectors via hypernetworks for unseen prompts, and composing reusable semantic directions with minimal data. This week's progress questions whether single linear directions suffice, proposing richer geometric and optimization-based accounts of steering.
Situation
The representative introductions frame a shared problem: prompt engineering is brittle, fine-tuning is costly, and it remains unclear how models internally encode controllable behaviors. Activation-space interventions occupy a practical middle ground for shaping model behavior at inference time, particularly for instruction adherence, domain adaptation, and multi-constraint control.
The field is shifting from single handcrafted steering directions toward richer strategies. One line extracts contrastive vectors for verifiable instructions such as format, length, or keyword constraints. A second learns hypernetworks that produce task-targeted steering vectors for unseen prompts, scaling supervised steering without per-task training. A third searches for task-specific mixtures of reusable semantic basis vectors from only a few examples. Supplemental evidence suggests that steering effectiveness may depend on sample-specific geometry rather than a single clean linear concept direction, motivating more structured accounts of internal representations.
Infographic (English)

Progress
The Cylindrical Representation Hypothesis for Language Model Steering <See Details on Fugu-MT>
Proposes the Cylindrical Representation Hypothesis, modeling concept representations as having a central axis and an orthogonal normal plane with sample-specific sensitive sectors. Provides a geometric explanation for why steering outcomes vary across samples, moving beyond the assumption of a single global linear concept direction.
Conceptors for Semantic Steering <See Details on Fugu-MT>
Replaces single steering vectors with conceptors—soft projection matrices estimated from activations spanning both poles of a concept. Broadens the intervention space from one-dimensional directions to subspace-level projections, offering a richer geometric model of concept control.
Steer Like the LLM: Activation Steering that Mimics Prompting <See Details on Fugu-MT>
Formulates prompt-based steering as a special case of activation steering, deriving latent interventions that reproduce prompting effects. Narrows the performance gap between prompt-based control and activation-level interventions by grounding both in a unified framework.
Minimizing Collateral Damage in Activation Steering <See Details on Fugu-MT>
Formalizes collateral damage—unintended changes to non-target feature directions—and casts steering as a constrained optimization problem. Shifts the focus from merely achieving behavioral control to preserving non-target features, enabling more selective interventions.
Referring Multiple Regions with Large Multimodal Models via Contextual Latent Steering <See Details on Fugu-MT>
Introduces CSteer, a training-free contextual latent steering method that enables general multimodal models to refer to multiple image regions. Extends activation steering from text-only instruction following to multimodal visual grounding without task-specific fine-tuning.
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior <See Details on Fugu-MT>
Studies manifold-level steering and demonstrates that internal representation geometry corresponds to behavioral control in reasoning and in-context learning tasks. Suggests that effective steering may require respecting richer manifold structure rather than relying solely on linear directions.
Outlook
Outlook Summary
Near-term activation steering is likely to shift from fixed, single-vector edits toward adaptive and geometry-aware control. Future work will tune steering weights dynamically, filter weak instruction representations, and use small example sets more efficiently. This week’s work supports that shift through cylindrical, manifold, and projection-based views, plus explicit constraints on collateral damage to unrelated abilities. A second direction is broader scaling across tasks, model families, and modalities while keeping interventions selective. Researchers are likely to test reusable control spaces, cross-model transfer, stronger quality metrics, sparsity, and subspace constraints so steering remains interpretable and reliable.
Infographic (English)

Three-Year Movement
Over the next year, activation steering is likely to move from simple fixed vectors toward adaptive methods that respect the shape of internal representations. Researchers will test whether subspaces, local regions, projection methods, and filters can improve a target behavior without damaging unrelated skills. This creates the setup for the three-year movement: steering becomes a practical control layer only when each intervention is measured against collateral damage. By around 36 months, the main progress is likely to be less about proving that steering can change outputs and more about defining where it is safe to use. Teams will validate steering model by model, layer by layer, and task by task, because transfer will not be assumed. The base case is useful but cautious adoption, with steering treated as a controlled adjustment rather than a general replacement for training, prompting, or safety evaluation.
In the first year, the shift toward geometry-aware steering turns into paired evaluation: tests for the desired behavior are run beside tests for factuality, reasoning, safety boundaries, tool use, and multimodal grounding. Researchers compare manifold, projection, subspace, and composed-vector methods, while tools begin to package steering settings with model version, layer, weight range, prompt distribution, and side-effect results. In the second year, these practices become a normal quality-assurance layer for dynamic interventions. Steering profiles are treated like versioned deployment artifacts, and managed services may combine vector generation with automated regression scans. By around 36 months, the key movement is toward validation authority. Research maps when steering transfers across model sizes, architectures, modalities, and version changes, and when it must be re-tested. Adoption grows most where providers can document selectivity, regression limits, transfer boundaries, monitoring, and rollback behavior.
In the first year, the control-theory path would reframe a model’s layer-by-layer computation as a state trajectory that can be monitored and corrected. Instead of only adding a steering vector, researchers would ask whether activations are drifting outside a useful or safe region and how a feedback controller should respond. Early application would be tooling for open-weights models, comparing closed-loop steering with prompts, static vectors, and classifier guardrails. In the second year, if results replicate, libraries and benchmarks could connect transformer models with controller training, control simulations, and trajectory-level tests. Control barrier functions, meaning mathematical boundaries the system tries not to cross, would be tested in higher-latency settings such as regulated drafting and safety-sensitive tool use. By around 36 months, the movement would be toward a split between the base model and a runtime control plane. This path advances only if controllers outperform simpler safeguards under realistic latency and adversarial testing.
1-Year / 3-Year Research-Application Infographic

References
- Improving Instruction-Following in Language Models through Activation Steering - Authors: Alessandro Stolfo, Vidhisha Balachandran, Safoora Yousefi, Eric Horvitz, Besmira Nushi, / <See Details on Fugu-MT> / License: CC-BY-4.0
- HyperSteer: Activation Steering at Scale with Hypernetworks - Authors: Jiuding Sun, Sidharth Baskaran, Zhengxuan Wu, Michael Sklar, Christopher Potts, Atticus Geiger, / <See Details on Fugu-MT> / License: CC-BY-4.0
- Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs - Authors: Pengrui Han, Xueqiang Xu, Keyang Xuan, Peiyang Song, Siru Ouyang, Runchu Tian, Yuqing Jiang, Cheng Qian, Pengcheng Jiang, Jiashuo Sun, Junxia Cui, Ming Zhong, Ge Liu, Jiawei Han, Jiaxuan You, / <See Details on Fugu-MT> / License: CC-BY-4.0