Related papers: Inference-time Alignment via Sparse Junction Steering

Inference-time Alignment via Sparse Junction Steering

URL: http://arxiv.org/abs/2602.21215v1
Date: Fri, 30 Jan 2026 08:40:47 GMT
Title: Inference-time Alignment via Sparse Junction Steering
Authors: Runyi Hu, Jie Zhang, Shiqian Zhao, Jiale Meng, Jiwei Li, Jason Zeng, Ming Wu, Michael Heinrich, Yonggang Wen, Tianwei Zhang,
Abstract summary: Token-level steering has emerged as a pivotal approach for inference-time alignment.<n>Existing methods rely on dense intervention at every decoding step.<n>We show that dense intervention is unnecessary and propose sparse junction steering.
Score: 25.464612964225484
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Token-level steering has emerged as a pivotal approach for inference-time alignment, enabling fine grained control over large language models by modulating their output distributions without parameter updates. While effective, existing methods rely on dense intervention at every decoding step. This persistent manipulation not only incurs substantial computational overhead but also risks compromising generation quality by excessively drifting from the model's intrinsic distribution. In this work, we show that dense intervention is unnecessary and propose Sparse Inference time Alignment (SIA), which performs sparse junction steering by intervening only at critical decision points along the generation trajectory. Our key insight is that high entropy junctions mark pivotal decision points in the generation trajectory and are particularly susceptible to misalignment, indicating the need to introduce alignment related reward signals at these points. Extensive experiments across different model families and alignment objectives show that steering only 20% to 80% of tokens achieves superior alignment-efficiency trade offs. For strong base models such as Qwen3, intervening on as few as 20% of tokens matches or even surpasses heavily post-trained instruct models. This sparsity enables stronger guidance while better preserving the model's native distribution, integrates seamlessly with search based methods such as Best-of-N, and reduces computational cost by up to 6x.

Related papers

Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions [37.08071497197165]
Intervention-based model steering offers a lightweight and interpretable alternative to prompting and fine-tuning.<n>We build on the principles of distributed alignment search to propose a new steering method: Concept DAS.<n>We show that Concept DAS does not always outperform preference-optimization methods but may benefit more from increased model scale.
arXiv Detail & Related papers (2026-02-05T02:51:00Z)
D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning [49.16227597771663]
D2Pruner is a framework that combines debiased importance with a structural pruning mechanism.<n>It reduces FLOPs by 74.2% while retaining 99.2% of its original performance.<n>It marks a significant advancement with up to 63. 53% improvement over existing methods.
arXiv Detail & Related papers (2025-12-22T14:42:31Z)
Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank [65.00301565190824]
mname is a plug-and-play training framework that requires no external encoders.<n>mname achieves a state-of-the-art FID of textbf2.40 within 400k steps, significantly outperforming comparable methods.
arXiv Detail & Related papers (2025-12-09T14:39:26Z)
Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models [16.540220733551823]
Large Vision-Language Models (VLMs) enable strong multimodal reasoning but incur heavy inference costs from redundant visual tokens.<n> Attention-based methods rely on raw attention scores, which are often unstable across layers and heads.<n>We propose ours, a training-free framework built on a simple intuition.
arXiv Detail & Related papers (2025-09-29T14:20:05Z)
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching [36.348940136801296]
A novel guidance framework for discrete data is proposed to address this problem.<n>We derive the exact transition rate for the desired distribution given a learned discrete flow matching model.<n>We demonstrate the effectiveness of our proposed guidance on energy-guided simulations and preference alignment on text-to-image generation and multimodal understanding tasks.
arXiv Detail & Related papers (2025-09-26T05:51:31Z)
ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Learning Distribution-Wise Control in Representation Space for Language Models [7.756342860929851]
Learnable interventions aim to apply pointwise control within the concept subspace and have proven effective in altering high-level behaviors.<n>We extend this approach to the distribution level, enabling the model to learn not only pointwise transformations but also the surrounding regions of the concept subspace.
arXiv Detail & Related papers (2025-06-07T06:52:58Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient ensembling method for self-attention networks.<n>The method not only outperforms state-of-the-art implicit techniques like BatchEnsemble, but even matches or exceeds the accuracy of an Explicit Ensemble.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.