Related papers: Your Language Model Secretly Contains Personality Subnetworks

Your Language Model Secretly Contains Personality Subnetworks

URL: http://arxiv.org/abs/2602.07164v1
Date: Fri, 06 Feb 2026 20:03:28 GMT
Title: Your Language Model Secretly Contains Personality Subnetworks
Authors: Ruimeng Ye, Zihan Wang, Zinan Ling, Yang Xiao, Manling Li, Xiaolong Ma, Bo Hui,
Abstract summary: We show that large language models already contain persona-specializedworks in their parameter space.<n>Our method is entirely training-free and relies solely on the language model's existing parameter space.
Score: 31.480534845874473
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans shift between different personas depending on social context. Large Language Models (LLMs) demonstrate a similar flexibility in adopting different personas and behaviors. Existing approaches, however, typically adapt such behavior through external knowledge such as prompting, retrieval-augmented generation (RAG), or fine-tuning. We ask: do LLMs really need external context or parameters to adapt to different behaviors, or do they already have such knowledge embedded in their parameters? In this work, we show that LLMs already contain persona-specialized subnetworks in their parameter space. Using small calibration datasets, we identify distinct activation signatures associated with different personas. Guided by these statistics, we develop a masking strategy that isolates lightweight persona subnetworks. Building on the findings, we further discuss: how can we discover opposing subnetwork from the model that lead to binary-opposing personas, such as introvert-extrovert? To further enhance separation in binary opposition scenarios, we introduce a contrastive pruning strategy that identifies parameters responsible for the statistical divergence between opposing personas. Our method is entirely training-free and relies solely on the language model's existing parameter space. Across diverse evaluation settings, the resulting subnetworks exhibit significantly stronger persona alignment than baselines that require external knowledge while being more efficient. Our findings suggest that diverse human-like behaviors are not merely induced in LLMs, but are already embedded in their parameter space, pointing toward a new perspective on controllable and interpretable personalization in large language models.

Related papers

Us-vs-Them bias in Large Language Models [0.569978892646475]
We find consistent ingroup-positive and outgroup-negative associations across foundational large language models.<n>For personas examined, conservative personas exhibit greater outgroup hostility, whereas liberal personas display stronger ingroup solidarity.
arXiv Detail & Related papers (2025-12-03T07:11:22Z)
Mind the Gap: Linguistic Divergence and Adaptation Strategies in Human-LLM Assistant vs. Human-Human Interactions [14.21024646209994]
Large Language Models (LLMs) are increasingly deployed in customer-facing applications.<n>Our study shows significant differences in grammatical fluency, politeness, and lexical diversity in user language between the two settings.<n>To enhance robustness to post-launch communication style changes, we experimented with two strategies.
arXiv Detail & Related papers (2025-10-03T00:45:37Z)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs [55.8331366739144]
We introduce LIFESTATE-BENCH, a benchmark designed to assess lifelong learning in large language models (LLMs)<n>Our fact checking evaluation probes models' self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches.
arXiv Detail & Related papers (2025-03-30T16:50:57Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.<n>We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.<n>We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z)
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content. We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z)
Quantifying the Persona Effect in LLM Simulations [25.367927300697424]
Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for 10% variance in annotations in existing subjective NLP datasets.
arXiv Detail & Related papers (2024-02-16T16:35:35Z)
Eliciting Personality Traits in Large Language Models [0.0]
Large Language Models (LLMs) are increasingly being utilized by both candidates and employers in the recruitment context. This study seeks to obtain a better understanding of such models by examining their output variations based on different input prompts.
arXiv Detail & Related papers (2024-02-13T10:09:00Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging [148.77027765872006]
We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem. LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
arXiv Detail & Related papers (2023-10-17T20:22:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.