Designing Role Vectors to Improve LLM Inference Behaviour
- URL: http://arxiv.org/abs/2502.12055v1
- Date: Mon, 17 Feb 2025 17:24:37 GMT
- Title: Designing Role Vectors to Improve LLM Inference Behaviour
- Authors: Daniele Potertì, Andrea Seveso, Fabio Mercorio,
- Abstract summary: The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain.<n>This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting.
- Score: 8.995812770349605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving task performance in relevant domains while marginally affecting unrelated tasks. This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.
Related papers
- Improving Reasoning Performance in Large Language Models via Representation Engineering [2.0099933815960256]
We propose a representation engineering approach for large language models (LLMs)
Model activations are read from the residual stream of an LLM when processing a reasoning task.
We show that an LLM can, to a certain degree, be controlled to improve its perceived reasoning ability by modulating activations.
arXiv Detail & Related papers (2025-04-28T04:58:43Z) - Analyzing sequential activity and travel decisions with interpretable deep inverse reinforcement learning [11.791625302942418]
We introduce an interpretable DIRL framework for analyzing activity-travel decision processes.
Our proposed framework adapts an adversarial IRL approach to infer the reward and policy functions of activity-travel behavior.
Our analysis of real-world travel survey data reveals promising results in two key areas.
arXiv Detail & Related papers (2025-03-17T02:54:02Z) - Teaching LLMs to Refine with Tools [68.23479664749271]
Large language models (LLMs) can refine their responses based on feedback, enabling self-improvement through iterative training or test-time refinement.<n>We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.
arXiv Detail & Related papers (2024-12-22T05:43:50Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - Do Influence Functions Work on Large Language Models? [10.463762448166714]
Influence functions are important for quantifying the impact of individual training data points on a model's predictions.<n>We evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings.
arXiv Detail & Related papers (2024-09-30T06:50:18Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks? [51.42906577386907]
This study explores the factors influencing the performance of Large Language Models (LLMs) in causal discovery tasks.
A higher frequency of causal mentions correlates with better model performance, suggesting that extensive exposure to causal information during training enhances the models' causal discovery capabilities.
arXiv Detail & Related papers (2024-07-29T01:45:05Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization [34.05163996072159]
"steering vectors" are extracted from the activations of human preference data.
This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization.
Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs.
arXiv Detail & Related papers (2024-05-28T05:10:40Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.<n>In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.<n>We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.<n>These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - An Empirical Study of In-context Learning in LLMs for Machine Translation [10.97460689696944]
This study is an exhaustive study of in-context learning for machine translation.
We first establish that ICL is primarily example-driven and not instruction-driven.
Our analysis includes factors such as quality and quantity of demonstrations, spatial proximity, and source versus target originality.
arXiv Detail & Related papers (2024-01-22T16:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.