Challenging the Validity of Personality Tests for Large Language Models
- URL: http://arxiv.org/abs/2311.05297v2
- Date: Wed, 5 Jun 2024 10:33:18 GMT
- Title: Challenging the Validity of Personality Tests for Large Language Models
- Authors: Tom Sühr, Florian E. Dorner, Samira Samadi, Augustin Kelava,
- Abstract summary: Large language models (LLMs) behave increasingly human-like in text-based interactions.
LLMs' responses to personality tests systematically deviate from human responses.
- Score: 2.9123921488295768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With large language models (LLMs) like GPT-4 appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate personality traits of LLMs using questionnaires originally developed for humans. While reusing measures is a resource-efficient way to evaluate LLMs, careful adaptations are usually required to ensure that assessment results are valid even across human subpopulations. In this work, we provide evidence that LLMs' responses to personality tests systematically deviate from human responses, implying that the results of these tests cannot be interpreted in the same way. Concretely, reverse-coded items ("I am introverted" vs. "I am extraverted") are often both answered affirmatively. Furthermore, variation across prompts designed to "steer" LLMs to simulate particular personality types does not follow the clear separation into five independent personality factors from human samples. In light of these results, we believe that it is important to investigate tests' validity for LLMs before drawing strong conclusions about potentially ill-defined concepts like LLMs' "personality".
Related papers
- Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis [0.27309692684728604]
We prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs.
We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties.
arXiv Detail & Related papers (2024-05-12T10:52:15Z) - "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust [51.542856739181474]
We show how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance.
We find that first-person expressions decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy.
Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters.
arXiv Detail & Related papers (2024-05-01T16:43:55Z) - LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced
Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts.
Most existing methods learn post features directly by fine-tuning the pre-trained language models.
We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Self-Assessment Tests are Unreliable Measures of LLM Personality [2.887477629420772]
We analyze the reliability of personality scores obtained from self-assessment personality tests using two simple experiments.
We find that all three prompts lead to very different personality scores, a difference that is statistically significant for all traits in a large majority of scenarios.
Since most of the self-assessment tests exist in the form of multiple choice question (MCQ) questions, we argue that the scores should also be robust to the order in which the options are presented.
arXiv Detail & Related papers (2023-09-15T05:19:39Z) - Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing
Perspective [63.92197404447808]
Large language models (LLMs) have shown some human-like cognitive abilities.
We propose an adaptive testing framework for LLM evaluation.
This approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Revisiting the Reliability of Psychological Scales on Large Language
Models [66.31055885857062]
This study aims to determine the reliability of applying personality assessments to Large Language Models (LLMs)
By shedding light on the personalization of LLMs, our study endeavors to pave the way for future explorations in this field.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits [30.770525830385637]
We study the behavior of large language models (LLMs) based on the Big Five personality model.
Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types.
Human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%.
arXiv Detail & Related papers (2023-05-04T04:58:00Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.