Exploring the Personality Traits of LLMs through Latent Features Steering
- URL: http://arxiv.org/abs/2410.10863v2
- Date: Sun, 16 Feb 2025 22:19:15 GMT
- Title: Exploring the Personality Traits of LLMs through Latent Features Steering
- Authors: Shu Yang, Shenzhe Zhu, Liang Liu, Lijie Hu, Mengdi Li, Di Wang,
- Abstract summary: We investigate how factors, such as cultural norms and environmental stressors, encoded within large language models (LLMs) shape their personality traits.
We propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model.
- Score: 12.142248881876355
- License:
- Abstract: Large language models (LLMs) have significantly advanced dialogue systems and role-playing agents through their ability to generate human-like text. While prior studies have shown that LLMs can exhibit distinct and consistent personalities, the mechanisms through which these models encode and express specific personality traits remain poorly understood. To address this, we investigate how various factors, such as cultural norms and environmental stressors, encoded within LLMs, shape their personality traits, guided by the theoretical framework of social determinism. Inspired by related work on LLM interpretability, we propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model, thereby eliminating the need for retraining. Furthermore, we analyze the implications of these factors for model safety, focusing on their impact through the lens of personality.
Related papers
- Investigating the Zone of Proximal Development of Language Models for In-Context Learning [59.91708683601029]
We introduce a learning analytics framework to analyze the in-context learning (ICL) behavior of large language models (LLMs)
We adapt the Zone of Proximal Development (ZPD) theory to ICL, measuring the ZPD of LLMs based on model performance on individual examples.
Our findings reveal a series of intricate and multifaceted behaviors of ICL, providing new insights into understanding and leveraging this technique.
arXiv Detail & Related papers (2025-02-10T19:36:21Z) - Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - Language Models Show Stable Value Orientations Across Diverse Role-Plays [4.906478894661688]
We show that large language models (LLMs) exhibit consistent value orientations despite adopting diverse personas.
We introduce the role-play-at-scale methodology, which involves prompting LLMs with randomized, diverse personas.
This approach reveals consistent patterns in LLM responses across diverse role-play scenarios, indicating deeply encoded inherent tendencies.
arXiv Detail & Related papers (2024-08-16T23:24:10Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - An LLM Feature-based Framework for Dialogue Constructiveness Assessment [8.87747076871578]
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructiveness outcomes following dialogues for such use cases.
These objectives can be achieved by training either interpretable feature-based models or neural models such as pre-trained language models.
We propose an LLM feature-based framework for dialogue constructiveness assessment that combines the strengths of feature-based and neural approaches.
arXiv Detail & Related papers (2024-06-20T22:10:52Z) - Explaining Large Language Models Decisions Using Shapley Values [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes.
However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain.
This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality [11.660251022962141]
We investigate the validity of conventional personality questionnaires in capturing the human-like personality traits of Large Language Models (LLMs)
Our objective is to assess the congruence between the personality traits LLMs claim to possess and their demonstrated tendencies in real-world scenarios.
arXiv Detail & Related papers (2024-02-22T16:32:08Z) - Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions.
Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases.
These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.