SteerX: Disentangled Steering for LLM Personalization
- URL: http://arxiv.org/abs/2510.22256v1
- Date: Sat, 25 Oct 2025 11:26:20 GMT
- Title: SteerX: Disentangled Steering for LLM Personalization
- Authors: Xiaoyan Zhao, Ming Yan, Yilun Qiu, Haoting Ni, Yang Zhang, Fuli Feng, Hong Cheng, Tat-Seng Chua,
- Abstract summary: Large language models (LLMs) have shown remarkable success in recent years, enabling a wide range of applications.<n>A critical factor in building such assistants is personalizing LLMs, as user preferences and needs vary widely.<n>We propose SteerX, a method that isolates preference-driven components from preference-agnostic components.
- Score: 75.89038195784701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have shown remarkable success in recent years, enabling a wide range of applications, including intelligent assistants that support users' daily life and work. A critical factor in building such assistants is personalizing LLMs, as user preferences and needs vary widely. Activation steering, which directly leverages directions representing user preference in the LLM activation space to adjust its behavior, offers a cost-effective way to align the model's outputs with individual users. However, existing methods rely on all historical data to compute the steering vector, ignoring that not all content reflects true user preferences, which undermines the personalization signal. To address this, we propose SteerX, a disentangled steering method that isolates preference-driven components from preference-agnostic components. Grounded in causal inference theory, SteerX estimates token-level causal effects to identify preference-driven tokens, transforms these discrete signals into a coherent description, and then leverages them to steer personalized LLM generation. By focusing on the truly preference-driven information, SteerX produces more accurate activation steering vectors and enhances personalization. Experiments on two representative steering backbone methods across real-world datasets demonstrate that SteerX consistently enhances steering vector quality, offering a practical solution for more effective LLM personalization.
Related papers
- SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support [3.755588097509539]
Online product reviews contain rich but noisy signals that overwhelm users and hinder effective decision-making.<n>We propose a steerable review summarization framework that aligns outputs with explicit user personas to support personalized purchase decisions.<n>Our results highlight the promise of steerable pluralistic alignment for building next-generation personalized decision-support systems.
arXiv Detail & Related papers (2025-12-12T18:05:52Z) - Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement [31.282134977964976]
Existing steering methods rely on large-scale datasets to learn clear behavioral information.<n>We introduce Refinement of Steering Vector via Sparse Autoencoder (SAE-RSV) that leverages SAEs to semantically denoise and augment the steering vectors.<n>In our framework, we first remove task-irrelevant features according to their semantics provided by SAEs, and then enrich task-relevant features missing from the small dataset through their semantic similarity to the identified relevant features.
arXiv Detail & Related papers (2025-09-28T10:49:22Z) - NextQuill: Causal Preference Modeling for Enhancing LLM Personalization [82.15961484963256]
We introduce NextQuill, a novel personalization framework grounded in causal preference modeling.<n>Building on this insight, NextQuill introduces two complementary alignment strategies.<n> Experiments across multiple personalization benchmarks demonstrate that NextQuill significantly improves personalization quality.
arXiv Detail & Related papers (2025-06-03T02:08:55Z) - ExpertSteer: Intervening in LLMs through Expert Knowledge [86.98098988779809]
Activation steering offers a promising method to control the generation process of Large Language Models.<n>We propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors.<n>We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains.
arXiv Detail & Related papers (2025-05-18T08:55:46Z) - Effectively Steer LLM To Follow Preference via Building Confident Directions [39.40603123075168]
We propose a theoretical framework to understand and quantify the model steering methods.<n>Inspired by the framework, we propose a confident direction steering method (CONFST) that steers LLMs via modifying their activations.<n>Our approach offers three key advantages over popular bidirectional model steering methods.
arXiv Detail & Related papers (2025-03-04T20:32:27Z) - Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization [68.79814761867314]
We propose Difference-aware Personalization Learning (DPL) to enhance Large Language Models (LLMs) personalization.<n>DPL strategically selects representative users for comparison and establishes a structured standard to extract task-relevant differences.<n>Experiments on real-world datasets demonstrate that DPL significantly enhances LLM personalization.
arXiv Detail & Related papers (2025-03-04T09:53:26Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization [34.05163996072159]
"steering vectors" are extracted from the activations of human preference data.
This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization.
Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs.
arXiv Detail & Related papers (2024-05-28T05:10:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.