Related papers: PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

URL: http://arxiv.org/abs/2602.15669v1
Date: Tue, 17 Feb 2026 15:47:58 GMT
Title: PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
Authors: Xiachong Feng, Liang Zhao, Weihong Zhong, Yichong Huang, Yuxuan Gu, Lingpeng Kong, Xiaocheng Feng, Bing Qin,
Abstract summary: Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning.<n>We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors.<n>On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates.
Score: 84.59328460968872
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.

Related papers

Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents [13.003892350610947]
The utility of Role-Playing Language Agents in sociological research is growing alongside the adoption of Large Language Models.<n>For realism in social simulation, Role-Playing Language Agents must adhere to their personas defined by character profiles.<n>We propose a novel, theory-driven method that dynamically estimates context-dependent persona importance and integrates it into weighted reward-guided decoding.
arXiv Detail & Related papers (2026-03-02T04:37:16Z)
Facet-Level Persona Control by Trait-Activated Routing with Contrastive SAE for Role-Playing LLMs [6.715533531385597]
Personality control in Role-Playing Agents (RPAs) is commonly achieved via training-free methods.<n>We propose a contrastive Sparse AutoEncoder framework that learns facet-level personality control vectors aligned with the Big Five 30-facet model.
arXiv Detail & Related papers (2026-02-22T12:39:02Z)
The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models [6.115372688029641]
We propose the Soul Engine, a framework based on the Linear Representation Hypothesis.<n>Using a dual-head architecture on a frozen Qwen-2.5 base, we extract disentangled personality vectors.<n>The model achieves a Mean Squared Error (MSE) of 0.011 against psychological ground truth.
arXiv Detail & Related papers (2025-12-08T02:00:57Z)
Profile-LLM: Dynamic Profile Optimization for Realistic Personality Expression in LLMs [11.672385046863655]
PersonaPulse is a framework that iteratively enhances role-play prompts while integrating a situational response benchmark as a scoring tool.<n> Quantitative evaluations demonstrate that the prompts generated by PersonaPulse outperform those of prior work.<n>For certain personality traits, the extent of personality evocation can be partially controlled by pausing the optimization process.
arXiv Detail & Related papers (2025-11-25T02:31:40Z)
Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs [10.99947795031516]
Large Language Models exhibit implicit personalities in their generation, but reliably controlling or aligning these traits to meet specific needs remains an open challenge.<n>We propose a novel pipeline that extracts hidden state activations from transformer layers using the Big Five Personality Traits.<n>Our findings reveal that personality traits occupy a low-rank shared subspace, and that these latent structures can be transformed into actionable mechanisms for effective steering.
arXiv Detail & Related papers (2025-10-29T05:56:39Z)
Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks [2.1117030125341385]
Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities.<n>This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks.
arXiv Detail & Related papers (2025-09-11T21:43:49Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
Personality traits have long been studied as predictors of human behavior.<n>Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems.
arXiv Detail & Related papers (2025-09-03T21:27:10Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits. We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z)
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors [8.761404991620285]
Activation intervention has emerged as an effective and economical method to modify the behavior of large language models (LLMs)<n>We propose Semantics-Adaptive Dynamic Intervention (SADI), a novel method that constructs a dynamic steering vector to intervene model activations at inference time.<n> Experimental results show that SADI outperforms established baselines by substantial margins, improving task performance without training.
arXiv Detail & Related papers (2024-10-16T06:58:49Z)
Activation Scaling for Steering and Interpreting Language Models [55.59689963561315]
We argue that successfully intervening on a model is a prerequisite for interpreting its internal workings. We establish a three-term objective: a successful intervention should flip the correct with the wrong token and vice versa. Using gradient-based optimization, this objective lets us learn (and later evaluate) a specific kind of efficient and interpretable intervention.
arXiv Detail & Related papers (2024-10-07T12:01:32Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.