Related papers: Weight Updates as Activation Shifts: A Principled Framework for Steering

Weight Updates as Activation Shifts: A Principled Framework for Steering

URL: http://arxiv.org/abs/2603.00425v1
Date: Sat, 28 Feb 2026 02:50:04 GMT
Title: Weight Updates as Activation Shifts: A Principled Framework for Steering
Authors: Dyah Adila, John Cooper, Alexander Yun, Avi Trost, Frederic Sala,
Abstract summary: Activation steering promises to be an extremely parameter-efficient form of adaptation, but its effectiveness depends on critical design choices.<n>We establish a first-order equivalence between activation-space interventions and weight-space updates, deriving the conditions under which activation steering can replicate fine-tuning behavior.<n>This equivalence yields a principled framework for steering design and identifies the post-block output as a theoretically-backed and highly expressive intervention site.
Score: 54.70188910511715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Activation steering promises to be an extremely parameter-efficient form of adaptation, but its effectiveness depends on critical design choices -- such as intervention location and parameterization -- that currently rely on empirical heuristics rather than a principled foundation. We establish a first-order equivalence between activation-space interventions and weight-space updates, deriving the conditions under which activation steering can replicate fine-tuning behavior. This equivalence yields a principled framework for steering design and identifies the post-block output as a theoretically-backed and highly expressive intervention site. We further explain why certain intervention locations outperform others and show that weight updates and activation updates play distinct, complementary functional roles. This analysis motivates a new approach -- joint adaptation -- that trains in both spaces simultaneously. Our post-block steering achieves accuracy within 0.2%-0.9%$ of full-parameter tuning, on average across tasks and models, while training only 0.04% of model parameters. It consistently outperforms prior activation steering methods such as ReFT and PEFT approaches including LoRA, while using significantly fewer parameters. Finally, we show that joint adaptation often surpasses the performance ceilings of weight and activation updates in isolation, introducing a new paradigm for efficient model adaptation.

Related papers

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics [6.208369829942616]
We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm.<n>ULD unifies the efficiency of model-free methods with the representational strengths of model-based approaches.<n> evaluated on 80 environments spanning Gym locomotion, DeepMind Control (proprioceptive and visual), and Atari.
arXiv Detail & Related papers (2026-02-13T06:06:56Z)
Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training [1.0518862318418603]
In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely.<n>We investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model.
arXiv Detail & Related papers (2026-02-09T07:14:28Z)
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics [81.80010043113445]
Local weight fine-tuning, LoRA-based adaptation, and activation-based interventions are studied in isolation.<n>We present a unified view that frames these interventions as dynamic weight updates induced by a control signal.<n>Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility.
arXiv Detail & Related papers (2026-02-02T17:04:36Z)
From Coefficients to Directions: Rethinking Model Merging with Directional Alignment [66.99062575537555]
We introduce a unified geometric framework, emphMerging with Directional Alignment (method), which aligns directional structures consistently in both the parameter and feature spaces.<n>Our analysis shows that directional alignment improves structural coherence, and extensive experiments across benchmarks, model scales, and task configurations further validate the effectiveness of our approach.
arXiv Detail & Related papers (2025-11-29T08:40:58Z)
Weight Spectra Induced Efficient Model Adaptation [54.8615621415845]
Fine-tuning large-scale foundation models incurs prohibitive computational costs.<n>We show that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact.<n>We propose a novel method that leverages learnable rescaling of top singular directions.
arXiv Detail & Related papers (2025-05-29T05:03:29Z)
Dynamic Adaptation of LoRA Fine-Tuning for Efficient and Task-Specific Optimization of Large Language Models [0.7421845364041001]
This paper presents a novel methodology of fine-tuning for large language models-dynamic LoRA.<n>It adds dynamic adaptation mechanisms to improve efficiency and performance.<n>The efficiency of the dynamic LoRA was validated in experiments on benchmark datasets.
arXiv Detail & Related papers (2025-01-24T18:54:14Z)
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement [0.7558576228782637]
We propose a framework for efficient Source-Free Domain Adaptation (SFDA)<n>Our approach introduces an improved paradigm for source-model preparation and target-side adaptation.<n>We demonstrate that our framework is compatible with various SFDA methods and achieves significant computational efficiency.
arXiv Detail & Related papers (2024-10-03T02:12:03Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.<n>We propose a novel model fine-tuning method to make full use of these ineffective parameters.<n>Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.