Related papers: Efficient and accurate steering of Large Language Models through attention-guided feature learning

Efficient and accurate steering of Large Language Models through attention-guided feature learning

URL: http://arxiv.org/abs/2602.00333v1
Date: Fri, 30 Jan 2026 21:35:13 GMT
Title: Efficient and accurate steering of Large Language Models through attention-guided feature learning
Authors: Parmida Davarmanesh, Ashia Wilson, Adityanarayanan Radhakrishnan,
Abstract summary: We introduce an attention-guided steering framework that overcomes three core challenges associated with steering.<n>Across a steering benchmark of 512 semantic concepts, our framework substantially improved steering over previous state-of-the-art.<n>Our framework opens further avenues for developing efficient, highly-scalable fine-tuning algorithms for industry-scale LLMs.
Score: 2.2940141855172036
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Steering, or direct manipulation of internal activations to guide LLM responses toward specific semantic concepts, is emerging as a promising avenue for both understanding how semantic concepts are stored within LLMs and advancing LLM capabilities. Yet, existing steering methods are remarkably brittle, with seemingly non-steerable concepts becoming completely steerable based on subtle algorithmic choices in how concept-related features are extracted. In this work, we introduce an attention-guided steering framework that overcomes three core challenges associated with steering: (1) automatic selection of relevant token embeddings for extracting concept-related features; (2) accounting for heterogeneity of concept-related features across LLM activations; and (3) identification of layers most relevant for steering. Across a steering benchmark of 512 semantic concepts, our framework substantially improved steering over previous state-of-the-art (nearly doubling the number of successfully steered concepts) across model architectures and sizes (up to 70 billion parameter models). Furthermore, we use our framework to shed light on the distribution of concept-specific features across LLM layers. Overall, our framework opens further avenues for developing efficient, highly-scalable fine-tuning algorithms for industry-scale LLMs.

Related papers

LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model [8.63792214154021]
This paper presents LLM-MLFFN, a novel large language model (LLM)-enhanced multi-level feature fusion network.<n>The proposed LLM-MLFFN framework integrates priors from largescale pre-trained models and employs a multi-level approach to enhance classification accuracy.<n> Evaluation on the open trajectory dataset demonstrates the superior performance of the proposed LLM-MLFFN, achieving a classification accuracy of over 94%.
arXiv Detail & Related papers (2026-03-03T02:26:04Z)
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment [49.68063561145927]
We propose a unified ordinary differential equations (ODEs)-based theoretical framework for activation steering.<n>We introduce ODESteer, a kind of ODE-based steering guided by barrier functions.<n>Compared to state-of-the-art activation steering methods, ODESteer achieves consistent empirical improvements.
arXiv Detail & Related papers (2026-02-19T17:13:44Z)
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models [75.45940282834327]
We introduce Viewpoint Learning, a task designed to evaluate and improve the spatial reasoning capabilities of MLLMs.<n>We present the Viewpoint-100K dataset, consisting of 100K object-centric image pairs with diverse viewpoints and corresponding question-answer pairs.<n>Our approach employs a two-stage fine-tuning strategy, resulting in significant improvements across multiple tasks.
arXiv Detail & Related papers (2025-11-03T14:27:00Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models [48.40096116617163]
Large Language Models (LLMs) demonstrate the ability to solve reasoning and mathematical problems using the Chain-of-Thought (CoT) technique.<n>This work, inspired by the deep thinking paradigm of DeepSeek-R1, utilizes a steering technique to enhance the reasoning ability of an LLM without external datasets.
arXiv Detail & Related papers (2025-05-21T15:17:59Z)
ExpertSteer: Intervening in LLMs through Expert Knowledge [86.98098988779809]
Activation steering offers a promising method to control the generation process of Large Language Models.<n>We propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors.<n>We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains.
arXiv Detail & Related papers (2025-05-18T08:55:46Z)
Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning [69.5875073447454]
This paper advances motion agents empowered by large language models (LLMs) toward autonomous navigation in dynamic and cluttered environments.<n>Our training-free framework supports multi-agent coordination, closed-loop replanning, and dynamic obstacle avoidance without retraining or fine-tuning.
arXiv Detail & Related papers (2025-03-10T13:39:09Z)
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting [14.567180355849501]
CoT-Drive is a novel approach that enhances motion forecasting by leveraging large language models (LLMs) and a chain-of-thought (CoT) prompting method.<n>We introduce a teacher-student knowledge distillation strategy to effectively transfer LLMs' advanced scene understanding capabilities to lightweight language models (LMs)<n>We present two new scene description datasets, Highway-Text and Urban-Text, designed for fine-tuning lightweight LMs to generate context-specific semantic annotations.
arXiv Detail & Related papers (2025-03-10T12:17:38Z)
CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models [37.476241509187304]
Large Language Models (LLMs) achieve remarkable performance through pretraining on extensive data.<n>The lack of interpretability in their underlying mechanisms limits the ability to effectively steer LLMs for specific applications.<n>We introduce an efficient selective layer intervention based on prominent parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-10-23T09:40:15Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [84.31119464141631]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.<n>Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.