Related papers: Linear Personality Probing and Steering in LLMs: A Big Five Study

Linear Personality Probing and Steering in LLMs: A Big Five Study

URL: http://arxiv.org/abs/2512.17639v1
Date: Fri, 19 Dec 2025 14:41:09 GMT
Title: Linear Personality Probing and Steering in LLMs: A Big Five Study
Authors: Michel Frising, Daniel Balcells,
Abstract summary: We investigate whether linear directions aligned with the Big Five personality traits can be used for probing and steering model behavior.<n>Our results suggest that linear directions aligned with trait-scores are effective probes for personality detection.
Score: 0.7933052462113936
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) exhibit distinct and consistent personalities that greatly impact trust and engagement. While this means that personality frameworks would be highly valuable tools to characterize and control LLMs' behavior, current approaches remain either costly (post-training) or brittle (prompt engineering). Probing and steering via linear directions has recently emerged as a cheap and efficient alternative. In this paper, we investigate whether linear directions aligned with the Big Five personality traits can be used for probing and steering model behavior. Using Llama 3.3 70B, we generate descriptions of 406 fictional characters and their Big Five trait scores. We then prompt the model with these descriptions and questions from the Alpaca questionnaire, allowing us to sample hidden activations that vary along personality traits in known, quantifiable ways. Using linear regression, we learn a set of per-layer directions in activation space, and test their effectiveness for probing and steering model behavior. Our results suggest that linear directions aligned with trait-scores are effective probes for personality detection, while their steering capabilities strongly depend on context, producing reliable effects in forced-choice tasks but limited influence in open-ended generation or when additional context is present in the prompt.

Related papers

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra [84.59328460968872]
Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning.<n>We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors.<n>On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates.
arXiv Detail & Related papers (2026-02-17T15:47:58Z)
Steering Latent Traits, Not Learned Facts: An Empirical Study of Activation Control Limits [0.0]
Large language models (LLMs) require precise behavior control for safe and effective deployment across diverse applications.<n>We focus on the question of how steering effectiveness varies across different behavior types and whether the nature of target behaviors can predict steering success.
arXiv Detail & Related papers (2025-11-23T04:28:41Z)
Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs [10.99947795031516]
Large Language Models exhibit implicit personalities in their generation, but reliably controlling or aligning these traits to meet specific needs remains an open challenge.<n>We propose a novel pipeline that extracts hidden state activations from transformer layers using the Big Five Personality Traits.<n>Our findings reveal that personality traits occupy a low-rank shared subspace, and that these latent structures can be transformed into actionable mechanisms for effective steering.
arXiv Detail & Related papers (2025-10-29T05:56:39Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
Personality traits have long been studied as predictors of human behavior.<n>Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems.
arXiv Detail & Related papers (2025-09-03T21:27:10Z)
Scaling Personality Control in LLMs with Big Five Scaler Prompts [1.534667887016089]
We present Big5-Scaler, a prompt-based framework for conditioning large language models with controllable personality traits.<n>By embedding numeric trait values into natural language prompts, our method enables fine-grained personality control without additional training.
arXiv Detail & Related papers (2025-08-08T09:11:05Z)
Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires [3.6001840369062386]
This work applies psychological tools to Large Language Models in diverse scenarios to generate personality profiles.<n>Our findings reveal that LLMs exhibit unique traits, varying characteristics, and distinct personality profiles even within the same family of models.
arXiv Detail & Related papers (2025-02-07T16:12:52Z)
Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits. We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z)
Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors [4.814107439144414]
We introduce a novel approach that uncovers latent personality dimensions in large language models (LLMs) Our experiments show that LLMs "rediscover" core personality traits such as extraversion, agreeableness, conscientiousness, neuroticism, and openness without relying on direct questionnaire inputs. We can use the derived principal components to assess personality along the Big Five dimensions, and achieve improvements in average personality prediction accuracy of up to 5% over fine-tuned models.
arXiv Detail & Related papers (2024-09-16T00:24:40Z)
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts. Most existing methods learn post features directly by fine-tuning the pre-trained language models. We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z)
Eliciting Personality Traits in Large Language Models [0.0]
Large Language Models (LLMs) are increasingly being utilized by both candidates and employers in the recruitment context. This study seeks to obtain a better understanding of such models by examining their output variations based on different input prompts.
arXiv Detail & Related papers (2024-02-13T10:09:00Z)
PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner. Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z)
Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs) We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.