Have Large Language Models Developed a Personality?: Applicability of
Self-Assessment Tests in Measuring Personality in LLMs
- URL: http://arxiv.org/abs/2305.14693v1
- Date: Wed, 24 May 2023 03:53:43 GMT
- Title: Have Large Language Models Developed a Personality?: Applicability of
Self-Assessment Tests in Measuring Personality in LLMs
- Authors: Xiaoyang Song, Akshat Gupta, Kiyan Mohebbizadeh, Shujie Hu, Anant
Singh
- Abstract summary: We show that we do not yet have the right tools to measure personality in language models.
Previous works have evaluated machine personality through self-assessment personality tests.
- Score: 1.1316247605466567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Have Large Language Models (LLMs) developed a personality? The short answer
is a resounding "We Don't Know!". In this paper, we show that we do not yet
have the right tools to measure personality in language models. Personality is
an important characteristic that influences behavior. As LLMs emulate
human-like intelligence and performance in various tasks, a natural question to
ask is whether these models have developed a personality. Previous works have
evaluated machine personality through self-assessment personality tests, which
are a set of multiple-choice questions created to evaluate personality in
humans. A fundamental assumption here is that human personality tests can
accurately measure personality in machines. In this paper, we investigate the
emergence of personality in five LLMs of different sizes ranging from 1.5B to
30B. We propose the Option-Order Symmetry property as a necessary condition for
the reliability of these self-assessment tests. Under this condition, the
answer to self-assessment questions is invariant to the order in which the
options are presented. We find that many LLMs personality test responses do not
preserve option-order symmetry. We take a deeper look at LLMs test responses
where option-order symmetry is preserved to find that in these cases, LLMs do
not take into account the situational statement being tested and produce the
exact same answer irrespective of the situation being tested. We also identify
the existence of inherent biases in these LLMs which is the root cause of the
aforementioned phenomenon and makes self-assessment tests unreliable. These
observations indicate that self-assessment tests are not the correct tools to
measure personality in LLMs. Through this paper, we hope to draw attention to
the shortcomings of current literature in measuring personality in LLMs and
call for developing tools for machine personality measurement.
Related papers
- Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics [29.325576963215163]
Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents.
We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs.
LLMs exhibit distinct and consistent personality, which is highly influenced by their training data.
arXiv Detail & Related papers (2024-06-20T19:50:56Z) - LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced
Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts.
Most existing methods learn post features directly by fine-tuning the pre-trained language models.
We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z) - Identifying Multiple Personalities in Large Language Models with
External Evaluation [6.657168333238573]
Large Language Models (LLMs) are integrated with human daily applications rapidly.
Many recent studies quantify LLMs' personalities using self-assessment tests that are created for humans.
Yet many critiques question the applicability and reliability of these self-assessment tests when applied to LLMs.
arXiv Detail & Related papers (2024-02-22T18:57:20Z) - Challenging the Validity of Personality Tests for Large Language Models [2.9123921488295768]
Large language models (LLMs) behave increasingly human-like in text-based interactions.
LLMs' responses to personality tests systematically deviate from human responses.
arXiv Detail & Related papers (2023-11-09T11:54:01Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs)
We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z) - Self-Assessment Tests are Unreliable Measures of LLM Personality [2.887477629420772]
We analyze the reliability of personality scores obtained from self-assessment personality tests using two simple experiments.
We find that all three prompts lead to very different personality scores, a difference that is statistically significant for all traits in a large majority of scenarios.
Since most of the self-assessment tests exist in the form of multiple choice question (MCQ) questions, we argue that the scores should also be robust to the order in which the options are presented.
arXiv Detail & Related papers (2023-09-15T05:19:39Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.