Related papers: PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

URL: http://arxiv.org/abs/2510.10205v1
Date: Sat, 11 Oct 2025 13:13:34 GMT
Title: PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
Authors: Manjiang Yu, Hongji Li, Priyanka Singh, Xue Li, Di Wang, Lijie Hu,
Abstract summary: We propose a position-wise activation steering framework for large language models (LLMs) on the web.<n>PIXEL learns a property-aligned subspace from dual views and selects intervention strength via a constrained geometric objective.<n>PIXEL consistently improves attribute alignment while preserving model general capabilities.
Score: 17.225716209866086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reliable behavior control is central to deploying large language models (LLMs) on the web. Activation steering offers a tuning-free route to align attributes (e.g., truthfulness) that ensure trustworthy generation. Prevailing approaches rely on coarse heuristics and lack a principled account of where to steer and how strongly to intervene. To this end, we propose Position-wise Injection with eXact Estimated Levels (PIXEL), a position-wise activation steering framework that, in contrast to prior work, learns a property-aligned subspace from dual views (tail-averaged and end-token) and selects intervention strength via a constrained geometric objective with a closed-form solution, thereby adapting to token-level sensitivity without global hyperparameter tuning. PIXEL further performs sample-level orthogonal residual calibration to refine the global attribute direction and employs a lightweight position-scanning routine to identify receptive injection sites. We additionally provide representation-level guarantees for the minimal-intervention rule, supporting reliable alignment. Across diverse models and evaluation paradigms, PIXEL consistently improves attribute alignment while preserving model general capabilities, offering a practical and principled method for LLMs' controllable generation. Our code is available at https://github.com/V1centNevwake/PIXEL-Adaptive-Steering

Related papers

IoUCert: Robustness Verification for Anchor-based Object Detectors [58.35703549470485]
We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in anchor-based object detection architectures.<n>We show that our method enables the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.
arXiv Detail & Related papers (2026-03-03T14:36:46Z)
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions [37.08071497197165]
Intervention-based model steering offers a lightweight and interpretable alternative to prompting and fine-tuning.<n>We build on the principles of distributed alignment search to propose a new steering method: Concept DAS.<n>We show that Concept DAS does not always outperform preference-optimization methods but may benefit more from increased model scale.
arXiv Detail & Related papers (2026-02-05T02:51:00Z)
Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition [15.33833908429706]
Thresholds are typically hand-tuned offline for a specific environment and fixed during deployment, leading to degraded performance under environmental change.<n>We propose a method that, given a user-defined precision requirement, automatically selects the operating point of a VPR system to maximise recall.<n> Experiments with multiple state-of-the-art VPR techniques and datasets show that the proposed approach consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2026-02-04T10:31:29Z)
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics [81.80010043113445]
Local weight fine-tuning, LoRA-based adaptation, and activation-based interventions are studied in isolation.<n>We present a unified view that frames these interventions as dynamic weight updates induced by a control signal.<n>Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility.
arXiv Detail & Related papers (2026-02-02T17:04:36Z)
Alignment-Aware Model Adaptation via Feedback-Guided Optimization [27.93864970404945]
Fine-tuning is the primary mechanism for adapting foundation models to downstream tasks.<n>We propose an alignment-aware fine-tuning framework that integrates feedback from an external alignment signal through policy-gradient-based regularization.
arXiv Detail & Related papers (2026-02-02T16:03:16Z)
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection [1.7802147489386628]
Large language models (LLMs) remain vulnerable to adversarial attacks that elicit harmful behaviors.<n>We propose Selective Steering, which addresses these limitations through two key innovations.<n> Experiments across nine models demonstrate that Selective Steering achieves 5.5x higher attack success rates than prior methods.
arXiv Detail & Related papers (2026-01-27T08:56:25Z)
Activation Steering with a Feedback Controller [4.609594868699996]
Proportional-Integral-Derivative (PID) Steering is a principled framework that leverages the full PID controller for activation steering in large language models.<n>PID Steering consistently outperforms existing approaches, achieving more robust and reliable behavioral control.
arXiv Detail & Related papers (2025-10-05T18:05:28Z)
An Adaptive ICP LiDAR Odometry Based on Reliable Initial Pose [11.704772923028976]
Iterative Closest Point (ICP)-based methods have become the core technique in LiDAR odometry.<n>The absence of an adaptive mechanism hinders the effective handling of complex dynamic environments.<n>This paper proposes an adaptive ICP-based LiDAR odometry method that relies on a reliable initial pose.
arXiv Detail & Related papers (2025-09-26T08:40:53Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios [54.58186816693791]
environments constantly change over time and space, posing significant challenges for object detectors trained based on a closed-set assumption.<n>We propose a new mechanism, converting the fine-tuning process to a specific- parameter generation.<n>In particular, we first design a dual-path LoRA-based domain-aware adapter that disentangles features into domain-invariant and domain-specific components.
arXiv Detail & Related papers (2025-06-30T17:14:12Z)
SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z)
Uncertainty-Guided Alignment for Unsupervised Domain Adaptation in Regression [5.437298646956505]
Unsupervised Domain Adaptation for Regression (UDAR) aims to adapt models from a labeled source domain to an unlabeled target domain for regression tasks. Traditional feature alignment methods, successful in classification, often prove ineffective for regression due to the correlated nature of regression features. We propose Uncertainty-Guided Alignment (UGA), a novel method that integrates predictive uncertainty into the feature alignment process.
arXiv Detail & Related papers (2024-01-24T14:55:02Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process. We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.