Related papers: Designing Role Vectors to Improve LLM Inference Behaviour

Designing Role Vectors to Improve LLM Inference Behaviour

URL: http://arxiv.org/abs/2502.12055v1
Date: Mon, 17 Feb 2025 17:24:37 GMT
Title: Designing Role Vectors to Improve LLM Inference Behaviour
Authors: Daniele Potertì, Andrea Seveso, Fabio Mercorio,
Abstract summary: The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain.<n>This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting.
Score: 8.995812770349605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving task performance in relevant domains while marginally affecting unrelated tasks. This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.

Related papers

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
DEAL: Disentangling Transformer Head Activations for LLM Steering [19.770342907146965]
We propose a principled causal-attribution framework for identifying behavior-relevant attention heads in transformers.<n>For each head, we train a vector-quantized autoencoder (VQ-AE) on its attention activations.<n>We assess the behavioral relevance of each head by the separability of VQ-AE encodings for behavior-aligned versus behavior-violating responses.
arXiv Detail & Related papers (2025-06-10T02:16:50Z)
Improving LLM Reasoning through Interpretable Role-Playing Steering [23.75554062102392]
Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs)<n>We introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated with role-playing behavior.<n>Our approach extracts latent representations from role-play prompts, selects the most relevant features based on activation patterns, and constructs a steering vector that can be injected into the model's residual stream with controllable intensity.
arXiv Detail & Related papers (2025-06-09T00:31:17Z)
ExpertSteer: Intervening in LLMs through Expert Knowledge [71.12193680015622]
Activation steering offers a promising method to control the generation process of Large Language Models.<n>We propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors.<n>We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains.
arXiv Detail & Related papers (2025-05-18T08:55:46Z)
Improving Reasoning Performance in Large Language Models via Representation Engineering [2.0099933815960256]
We propose a representation engineering approach for large language models (LLMs) Model activations are read from the residual stream of an LLM when processing a reasoning task. We show that an LLM can, to a certain degree, be controlled to improve its perceived reasoning ability by modulating activations.
arXiv Detail & Related papers (2025-04-28T04:58:43Z)
Analyzing sequential activity and travel decisions with interpretable deep inverse reinforcement learning [11.791625302942418]
We introduce an interpretable DIRL framework for analyzing activity-travel decision processes. Our proposed framework adapts an adversarial IRL approach to infer the reward and policy functions of activity-travel behavior. Our analysis of real-world travel survey data reveals promising results in two key areas.
arXiv Detail & Related papers (2025-03-17T02:54:02Z)
Teaching LLMs to Refine with Tools [68.23479664749271]
Large language models (LLMs) can refine their responses based on feedback, enabling self-improvement through iterative training or test-time refinement.<n>We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.
arXiv Detail & Related papers (2024-12-22T05:43:50Z)
Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored. Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors. We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z)
Do Influence Functions Work on Large Language Models? [10.463762448166714]
Influence functions are important for quantifying the impact of individual training data points on a model's predictions.<n>We evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings.
arXiv Detail & Related papers (2024-09-30T06:50:18Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks? [51.42906577386907]
This study explores the factors influencing the performance of Large Language Models (LLMs) in causal discovery tasks. A higher frequency of causal mentions correlates with better model performance, suggesting that extensive exposure to causal information during training enhances the models' causal discovery capabilities.
arXiv Detail & Related papers (2024-07-29T01:45:05Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization [34.05163996072159]
"steering vectors" are extracted from the activations of human preference data. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs.
arXiv Detail & Related papers (2024-05-28T05:10:40Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.<n>In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.<n>We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.<n>These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
An Empirical Study of In-context Learning in LLMs for Machine Translation [10.97460689696944]
This study is an exhaustive study of in-context learning for machine translation. We first establish that ICL is primarily example-driven and not instruction-driven. Our analysis includes factors such as quality and quantity of demonstrations, spatial proximity, and source versus target originality.
arXiv Detail & Related papers (2024-01-22T16:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.