Exploring the Personality Traits of LLMs through Latent Features Steering
- URL: http://arxiv.org/abs/2410.10863v2
- Date: Sun, 16 Feb 2025 22:19:15 GMT
- Title: Exploring the Personality Traits of LLMs through Latent Features Steering
- Authors: Shu Yang, Shenzhe Zhu, Liang Liu, Lijie Hu, Mengdi Li, Di Wang,
- Abstract summary: We investigate how factors, such as cultural norms and environmental stressors, encoded within large language models (LLMs) shape their personality traits.<n>We propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model.
- Score: 12.142248881876355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have significantly advanced dialogue systems and role-playing agents through their ability to generate human-like text. While prior studies have shown that LLMs can exhibit distinct and consistent personalities, the mechanisms through which these models encode and express specific personality traits remain poorly understood. To address this, we investigate how various factors, such as cultural norms and environmental stressors, encoded within LLMs, shape their personality traits, guided by the theoretical framework of social determinism. Inspired by related work on LLM interpretability, we propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model, thereby eliminating the need for retraining. Furthermore, we analyze the implications of these factors for model safety, focusing on their impact through the lens of personality.
Related papers
- Structured Personality Control and Adaptation for LLM Agents [11.050618253938126]
Large Language Models (LLMs) are increasingly shaping human-computer interaction (HCI)<n>We present a framework that models LLM personality via Jungian psychological types.<n>This design allows the agent to maintain nuanced traits while dynamically adjusting to interaction demands.
arXiv Detail & Related papers (2026-01-15T03:15:24Z) - Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning [54.12174882424842]
Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms.<n>We propose a novel interpretability framework to systematically analyze the roles and behaviors of attention heads.
arXiv Detail & Related papers (2025-12-03T10:24:34Z) - Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs [10.99947795031516]
Large Language Models exhibit implicit personalities in their generation, but reliably controlling or aligning these traits to meet specific needs remains an open challenge.<n>We propose a novel pipeline that extracts hidden state activations from transformer layers using the Big Five Personality Traits.<n>Our findings reveal that personality traits occupy a low-rank shared subspace, and that these latent structures can be transformed into actionable mechanisms for effective steering.
arXiv Detail & Related papers (2025-10-29T05:56:39Z) - Large Language Model Sourcing: A Survey [84.63438376832471]
Large language models (LLMs) have revolutionized artificial intelligence, shifting from supporting objective tasks to empowering subjective decision-making.<n>Due to the black-box nature of LLMs and the human-like quality of their generated content, issues such as hallucinations, bias, unfairness, and copyright infringement become significant.<n>This survey presents a systematic investigation into provenance tracking for content generated by LLMs, organized around four interrelated dimensions.
arXiv Detail & Related papers (2025-10-11T10:52:30Z) - IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization [66.6349183886101]
We propose IROTE, a novel in-context method for stable and transferable trait elicitation.<n>We show that one single IROTE-generated self-reflection can induce LLMs' stable impersonation of the target trait across diverse downstream tasks.
arXiv Detail & Related papers (2025-08-12T08:04:28Z) - A Comparative Study of Large Language Models and Human Personality Traits [6.354326674890978]
Large Language Models (LLMs) have demonstrated human-like capabilities in language comprehension and generation.<n>This study investigates whether LLMs exhibit personality-like traits and how these traits compare with human personality.
arXiv Detail & Related papers (2025-05-01T15:10:15Z) - An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)
This paper explores potential areas where statisticians can make important contributions to the development of LLMs.
We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z) - Investigating the Zone of Proximal Development of Language Models for In-Context Learning [59.91708683601029]
We introduce a learning analytics framework to analyze the in-context learning (ICL) behavior of large language models (LLMs)
We adapt the Zone of Proximal Development (ZPD) theory to ICL, measuring the ZPD of LLMs based on model performance on individual examples.
Our findings reveal a series of intricate and multifaceted behaviors of ICL, providing new insights into understanding and leveraging this technique.
arXiv Detail & Related papers (2025-02-10T19:36:21Z) - Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits.
We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z) - Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - An LLM Feature-based Framework for Dialogue Constructiveness Assessment [8.87747076871578]
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructiveness outcomes following dialogues for such use cases.
These objectives can be achieved by training either interpretable feature-based models or neural models such as pre-trained language models.
We propose an LLM feature-based framework for dialogue constructiveness assessment that combines the strengths of feature-based and neural approaches.
arXiv Detail & Related papers (2024-06-20T22:10:52Z) - Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions [2.6080756513915824]
Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions.
This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information.
arXiv Detail & Related papers (2024-06-18T02:32:57Z) - Explaining Large Language Models Decisions Using Shapley Values [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes.
However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain.
This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z) - PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models [25.657579792829743]
We empirically evaluate how role-playing prompting influences Theory-of-Mind (ToM) reasoning capabilities.
We propose the mechanism that, beyond the inherent variance in the complexity of reasoning tasks, performance differences arise because of socially-motivated prompting differences.
arXiv Detail & Related papers (2024-03-04T17:34:34Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Is Cognition and Action Consistent or Not: Investigating Large Language
Model's Personality [12.162460438332152]
We investigate the reliability of Large Language Models (LLMs) in professing human-like personality traits through responses to personality questionnaires.
Our goal is to evaluate the consistency between LLMs' professed personality inclinations and their actual "behavior"
We propose hypotheses for the observed results based on psychological theories and metrics.
arXiv Detail & Related papers (2024-02-22T16:32:08Z) - Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions.
Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases.
These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z) - LLMs Simulate Big Five Personality Traits: Further Evidence [51.13560635563004]
We analyze the personality traits simulated by Llama2, GPT4, and Mixtral.
This contributes to the broader understanding of the capabilities of LLMs to simulate personality traits.
arXiv Detail & Related papers (2024-01-31T13:45:25Z) - Personality Traits in Large Language Models [44.908741466152215]
Personality is a key factor determining the effectiveness of communication.
We present a comprehensive method for administering and validating personality tests on widely-used large language models.
We discuss application and ethical implications of the measurement and shaping method, in particular regarding responsible AI.
arXiv Detail & Related papers (2023-07-01T00:58:51Z) - Revealing the structure of language model capabilities [4.037009782513272]
We analyzed data from 29 different large language models across 27 cognitive tasks.
Results reveal a consistent structure in the capabilities of different LLMs.
We suggest that benchmarks could be streamlined by focusing on tasks that tap into each broad model ability.
arXiv Detail & Related papers (2023-06-14T15:43:25Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Influence of External Information on Large Language Models Mirrors
Social Cognitive Patterns [51.622612759892775]
Social cognitive theory explains how people learn and acquire knowledge through observing others.
Recent years have witnessed the rapid development of large language models (LLMs)
LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors.
arXiv Detail & Related papers (2023-05-08T16:10:18Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.