Related papers: EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

URL: http://arxiv.org/abs/2509.25175v1
Date: Mon, 29 Sep 2025 17:59:07 GMT
Title: EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
Authors: Haolei Xu, Xinyu Mei, Yuchen Yan, Rui Zhou, Wenqi Zhang, Weiming Lu, Yueting Zhuang, Yongliang Shen,
Abstract summary: Large language model (LLM) steering has emerged as a promising paradigm for controlling model behavior at inference time.<n>We present EasySteer, a unified framework for high-performance, LLM steering built on vLLM.
Score: 55.56674028743782
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) steering has emerged as a promising paradigm for controlling model behavior at inference time through targeted manipulation of hidden states, offering a lightweight alternative to expensive retraining. However, existing steering frameworks suffer from critical limitations: computational inefficiency, limited extensibility, and restricted functionality that hinder both research progress and practical deployment. We present EasySteer, a unified framework for high-performance, extensible LLM steering built on vLLM. Our system features modular architecture with pluggable interfaces for both analysis-based and learning-based methods, fine-grained parameter control, pre-computed steering vectors for eight application domains, and an interactive demonstration system. Through deep integration with vLLM's optimized inference engine, EasySteer achieves 5.5-11.4$\times$ speedup over existing frameworks. Extensive experiments demonstrate its effectiveness in overthinking mitigation, hallucination reduction, and other key applications. EasySteer transforms steering from research technique to production-ready capability, establishing critical infrastructure for deployable, controllable language models.

Related papers

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion [91.35078719566472]
Vision-Language Models (VLMs) create a severe visual feature bottleneck by using a crude, asymmetric connection.<n>We introduce Cross-Layer Injection (CLI), a novel and lightweight framework that forges a dynamic many-to-many bridge between the two modalities.
arXiv Detail & Related papers (2026-01-15T18:59:10Z)
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering [62.63376387138257]
We propose a plug-and-play intervention framework that adaptively steers large language models (LLMs) reasoning in activation space.<n>RISER constructs a library of reusable reasoning vectors and employs a lightweight Router to dynamically compose them for each input.<n>The Router is optimized via reinforcement learning under task-level rewards, activating latent cognitive primitives in an emergent and compositional manner.
arXiv Detail & Related papers (2026-01-14T08:04:33Z)
AR-MOT: Autoregressive Multi-object Tracking [56.09738000988466]
We propose a novel autoregressive paradigm that formulates MOT as a sequence generation task within a large language model (LLM) framework.<n>This design enables the model to output structured results through flexible sequence construction, without requiring any task-specific heads.<n>To enhance region-level visual perception, we introduce an Object Tokenizer based on a pretrained detector.
arXiv Detail & Related papers (2026-01-05T09:17:28Z)
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models [4.506695482619111]
This work introduces SteerVLM, a lightweight steering module for Vision-Language Models (VLMs)<n>Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context.<n>Our steering module requires learning parameters equal to 0.14% of the original VLM's size.
arXiv Detail & Related papers (2025-10-30T17:52:39Z)
Scale, Don't Fine-tune: Guiding Multimodal LLMs for Efficient Visual Place Recognition at Test-Time [12.659582318581606]
Current approaches, including Vision Foundation Models (VFMs) and Multimodal Large Language Models (MLLMs), enhance semantic understanding but suffer from high computational overhead and limited cross-domain transferability when fine-tuned.<n>We propose a novel framework employing Test-Time Scaling (TTS) that leverages vision-language alignment capabilities through Guidance-based methods for direct similarity scoring.<n>Our approach eliminates two-stage processing by employing structured prompts that generate length-controllable scoring outputs.
arXiv Detail & Related papers (2025-09-02T09:25:13Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
Optimizing Small Language Models for In-Vehicle Function-Calling [4.148443557388842]
We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices.<n>By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience.
arXiv Detail & Related papers (2025-01-04T17:32:56Z)
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering [30.51487692912812]
Multimodal Large Language Models (MLLMs) have significantly advanced visual tasks by integrating visual representations into large language models (LLMs)<n>We introduce Modality Linear Representation-Steering (MoReS) to achieve the goal.<n>MoReS effectively re-balances the intrinsic modalities throughout the model, where the key idea is to steer visual representations through linear transformations in the visual subspace across each model layer.
arXiv Detail & Related papers (2024-12-16T21:14:11Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection [2.1155908599769764]
We propose UniProj-Det, a lightweight modular framework for parameter-efficient open-vocabulary object detection.<n>UniProj-Det freezes pretrained backbones and introduces a Universal Projection module with a learnable modality token, enabling unified vision--language adaptation at minimal cost.
arXiv Detail & Related papers (2024-08-20T12:27:53Z)
Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM) This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature. We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.