Designing Role Vectors to Improve LLM Inference Behaviour
- URL: http://arxiv.org/abs/2502.12055v1
- Date: Mon, 17 Feb 2025 17:24:37 GMT
- Title: Designing Role Vectors to Improve LLM Inference Behaviour
- Authors: Daniele Potertì, Andrea Seveso, Fabio Mercorio,
- Abstract summary: The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain.
This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting.
- Score: 8.995812770349605
- License:
- Abstract: The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving task performance in relevant domains while marginally affecting unrelated tasks. This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.
Related papers
- Teaching LLMs to Refine with Tools [68.23479664749271]
Large language models (LLMs) can refine their responses based on feedback, enabling self-improvement through iterative training or test-time refinement.
We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.
arXiv Detail & Related papers (2024-12-22T05:43:50Z) - On the Impact of Fine-Tuning on Chain-of-Thought Reasoning [26.11408084129897]
This study investigates the effect of fine-tuning on the reasoning abilities of large language models.
It addresses questions regarding the impact of task-specific fine-tuning on overall reasoning capabilities, the influence of fine-tuning on Chain-of-Thought (CoT) reasoning performance, and the implications for the faithfulness of CoT reasonings.
arXiv Detail & Related papers (2024-11-22T23:54:37Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - Do Influence Functions Work on Large Language Models? [10.463762448166714]
Influence functions are important for quantifying the impact of individual training data points on a model's predictions.
We evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings.
arXiv Detail & Related papers (2024-09-30T06:50:18Z) - Most Influential Subset Selection: Challenges, Promises, and Beyond [9.479235005673683]
We study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence.
We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses.
We demonstrate that an adaptive version of theses which applies them iteratively, can effectively capture the interactions among samples.
arXiv Detail & Related papers (2024-09-25T20:00:23Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks? [51.42906577386907]
This study explores the factors influencing the performance of Large Language Models (LLMs) in causal discovery tasks.
A higher frequency of causal mentions correlates with better model performance, suggesting that extensive exposure to causal information during training enhances the models' causal discovery capabilities.
arXiv Detail & Related papers (2024-07-29T01:45:05Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization [34.05163996072159]
"steering vectors" are extracted from the activations of human preference data.
This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization.
Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs.
arXiv Detail & Related papers (2024-05-28T05:10:40Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.
In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.
These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.