Identifying Multiple Personalities in Large Language Models with
External Evaluation
- URL: http://arxiv.org/abs/2402.14805v1
- Date: Thu, 22 Feb 2024 18:57:20 GMT
- Title: Identifying Multiple Personalities in Large Language Models with
External Evaluation
- Authors: Xiaoyang Song, Yuta Adachi, Jessie Feng, Mouwei Lin, Linhao Yu, Frank
Li, Akshat Gupta, Gopala Anumanchipalli, Simerjot Kaur
- Abstract summary: Large Language Models (LLMs) are integrated with human daily applications rapidly.
Many recent studies quantify LLMs' personalities using self-assessment tests that are created for humans.
Yet many critiques question the applicability and reliability of these self-assessment tests when applied to LLMs.
- Score: 6.657168333238573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Language Models (LLMs) are integrated with human daily applications
rapidly, many societal and ethical concerns are raised regarding the behavior
of LLMs. One of the ways to comprehend LLMs' behavior is to analyze their
personalities. Many recent studies quantify LLMs' personalities using
self-assessment tests that are created for humans. Yet many critiques question
the applicability and reliability of these self-assessment tests when applied
to LLMs. In this paper, we investigate LLM personalities using an alternate
personality measurement method, which we refer to as the external evaluation
method, where instead of prompting LLMs with multiple-choice questions in the
Likert scale, we evaluate LLMs' personalities by analyzing their responses
toward open-ended situational questions using an external machine learning
model. We first fine-tuned a Llama2-7B model as the MBTI personality predictor
that outperforms the state-of-the-art models as the tool to analyze LLMs'
responses. Then, we prompt the LLMs with situational questions and ask them to
generate Twitter posts and comments, respectively, in order to assess their
personalities when playing two different roles. Using the external personality
evaluation method, we identify that the obtained personality types for LLMs are
significantly different when generating posts versus comments, whereas humans
show a consistent personality profile in these two different situations. This
shows that LLMs can exhibit different personalities based on different
scenarios, thus highlighting a fundamental difference between personality in
LLMs and humans. With our work, we call for a re-evaluation of personality
definition and measurement in LLMs.
Related papers
- LMLPA: Language Model Linguistic Personality Assessment [11.599282127259736]
Large Language Models (LLMs) are increasingly used in everyday life and research.
measuring the personality of a given LLM is currently a challenge.
This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs.
arXiv Detail & Related papers (2024-10-23T07:48:51Z) - LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries.
This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation.
By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics [29.325576963215163]
Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents.
We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs.
LLMs exhibit distinct and consistent personality, which is highly influenced by their training data.
arXiv Detail & Related papers (2024-06-20T19:50:56Z) - Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models [4.742123770879715]
The work represents a step up in understanding the dense relationship between NLP and human psychology through the lens of Open LLMs.
Our approach involves evaluating the intrinsic personality traits of Open LLM agents and determining the extent to which these agents can mimic human personalities.
arXiv Detail & Related papers (2024-01-13T16:41:40Z) - Illuminating the Black Box: A Psychometric Investigation into the
Multifaceted Nature of Large Language Models [3.692410936160711]
This study explores the idea of AI Personality or AInality suggesting that Large Language Models (LLMs) exhibit patterns similar to human personalities.
Using projective tests, we uncover hidden aspects of LLM personalities that are not easily accessible through direct questioning.
Our machine learning analysis revealed that LLMs exhibit distinct AInality traits and manifest diverse personality types, demonstrating dynamic shifts in response to external instructions.
arXiv Detail & Related papers (2023-12-21T04:57:21Z) - Do LLMs Possess a Personality? Making the MBTI Test an Amazing
Evaluation for Large Language Models [2.918940961856197]
We aim to investigate the feasibility of using the Myers-Briggs Type Indicator (MBTI), a widespread human personality assessment tool, as an evaluation metric for large language models (LLMs)
Specifically, experiments will be conducted to explore: 1) the personality types of different LLMs, 2) the possibility of changing the personality types by prompt engineering, and 3) How does the training dataset affect the model's personality.
arXiv Detail & Related papers (2023-07-30T09:34:35Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - In-Context Impersonation Reveals Large Language Models' Strengths and
Biases [56.61129643802483]
We ask LLMs to assume different personas before solving vision and language tasks.
We find that LLMs pretending to be children of different ages recover human-like developmental stages.
In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts.
arXiv Detail & Related papers (2023-05-24T09:13:15Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.