Revisiting the Reliability of Psychological Scales on Large Language
Models
- URL: http://arxiv.org/abs/2305.19926v3
- Date: Thu, 28 Dec 2023 13:21:01 GMT
- Title: Revisiting the Reliability of Psychological Scales on Large Language
Models
- Authors: Jen-tse Huang, Wenxuan Wang, Man Ho Lam, Eric John Li, Wenxiang Jiao,
Michael R. Lyu
- Abstract summary: This study aims to determine the reliability of applying personality assessments to Large Language Models (LLMs)
By shedding light on the personalization of LLMs, our study endeavors to pave the way for future explorations in this field.
- Score: 66.31055885857062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has extended beyond assessing the performance of Large
Language Models (LLMs) to examining their characteristics from a psychological
standpoint, acknowledging the necessity of understanding their behavioral
characteristics. The administration of personality tests to LLMs has emerged as
a noteworthy area in this context. However, the suitability of employing
psychological scales, initially devised for humans, on LLMs is a matter of
ongoing debate. Our study aims to determine the reliability of applying
personality assessments to LLMs, explicitly investigating whether LLMs
demonstrate consistent personality traits. Analyzing responses under 2,500
settings reveals that gpt-3.5-turbo shows consistency in responses to the Big
Five Inventory, indicating a high degree of reliability. Furthermore, our
research explores the potential of gpt-3.5-turbo to emulate diverse
personalities and represent various groups, which is a capability increasingly
sought after in social sciences for substituting human participants with LLMs
to reduce costs. Our findings reveal that LLMs have the potential to represent
different personalities with specific prompt instructions. By shedding light on
the personalization of LLMs, our study endeavors to pave the way for future
explorations in this field. We have made our experimental results and the
corresponding code openly accessible via
https://github.com/CUHK-ARISE/LLMPersonality.
Related papers
- LMLPA: Language Model Linguistic Personality Assessment [11.599282127259736]
Large Language Models (LLMs) are increasingly used in everyday life and research.
measuring the personality of a given LLM is currently a challenge.
This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs.
arXiv Detail & Related papers (2024-10-23T07:48:51Z) - Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear.
By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Is Cognition and Action Consistent or Not: Investigating Large Language
Model's Personality [12.162460438332152]
We investigate the reliability of Large Language Models (LLMs) in professing human-like personality traits through responses to personality questionnaires.
Our goal is to evaluate the consistency between LLMs' professed personality inclinations and their actual "behavior"
We propose hypotheses for the observed results based on psychological theories and metrics.
arXiv Detail & Related papers (2024-02-22T16:32:08Z) - LLMs Simulate Big Five Personality Traits: Further Evidence [51.13560635563004]
We analyze the personality traits simulated by Llama2, GPT4, and Mixtral.
This contributes to the broader understanding of the capabilities of LLMs to simulate personality traits.
arXiv Detail & Related papers (2024-01-31T13:45:25Z) - Challenging the Validity of Personality Tests for Large Language Models [2.9123921488295768]
Large language models (LLMs) behave increasingly human-like in text-based interactions.
LLMs' responses to personality tests systematically deviate from human responses.
arXiv Detail & Related papers (2023-11-09T11:54:01Z) - Understanding Social Reasoning in Language Models with Language Models [34.068368860882586]
We present a novel framework for generating evaluations with Large Language Models (LLMs) by populating causal templates.
We create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations.
We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations.
arXiv Detail & Related papers (2023-06-21T16:42:15Z) - PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits [30.770525830385637]
We study the behavior of large language models (LLMs) based on the Big Five personality model.
Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types.
Human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%.
arXiv Detail & Related papers (2023-05-04T04:58:00Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.