Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models
- URL: http://arxiv.org/abs/2406.17675v1
- Date: Tue, 25 Jun 2024 16:09:08 GMT
- Title: Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models
- Authors: Yuan Li, Yue Huang, Hongyi Wang, Xiangliang Zhang, James Zou, Lichao Sun,
- Abstract summary: Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
- Score: 57.518784855080334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a framework for investigating psychology in LLMs, including psychological dimension identification, assessment dataset curation, and assessment with results validation. Following this framework, we introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence. This benchmark includes thirteen datasets featuring diverse scenarios and item types. Our findings indicate that LLMs manifest a broad spectrum of psychological attributes. We also uncover discrepancies between LLMs' self-reported traits and their behaviors in real-world scenarios. This paper demonstrates a thorough psychometric assessment of LLMs, providing insights into reliable evaluation and potential applications in AI and social sciences.
Related papers
- PhDGPT: Introducing a psychometric and linguistic dataset about how large language models perceive graduate students and professors in psychology [0.3749861135832073]
This work introduces PhDGPT, a prompting framework and synthetic dataset that encapsulates the machine psychology of PhD researchers and professors.
The dataset consists of 756,000 datapoints, counting 300 iterations repeated across 15 academic events, 2 biological genders, 2 career levels and 42 unique item responses of the Depression, Anxiety, and Stress Scale (DASS-42)
By combining network psychometrics and psycholinguistic dimensions, this study identifies several similarities and distinctions between human and LLM data.
arXiv Detail & Related papers (2024-11-06T20:04:20Z) - PsychoLex: Unveiling the Psychological Mind of Large Language Models [1.3518297878940662]
This paper explores the intersection of psychology and artificial intelligence through the development and evaluation of specialized Large Language Models (LLMs)
PsychoLex is a suite of resources designed to enhance LLMs' proficiency in psychological tasks in both Persian and English.
We present the PsychoLexLLaMA model, optimized specifically for psychological applications, demonstrating superior performance compared to general-purpose models.
arXiv Detail & Related papers (2024-08-16T17:19:23Z) - PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents [68.50571379012621]
Psychological measurement is essential for mental health, self-understanding, and personal development.
PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z) - Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review [4.147674289030404]
Large language models (LLMs) have the potential to simulate aspects of human cognition and behavior.
LLMs offer innovative tools for literature review, hypothesis generation, experimental design, experimental subjects, data analysis, academic writing, and peer review in psychology.
There are issues like data privacy, the ethical implications of using LLMs in psychological research, and the need for a deeper understanding of these models' limitations.
arXiv Detail & Related papers (2024-01-03T03:01:29Z) - Illuminating the Black Box: A Psychometric Investigation into the
Multifaceted Nature of Large Language Models [3.692410936160711]
This study explores the idea of AI Personality or AInality suggesting that Large Language Models (LLMs) exhibit patterns similar to human personalities.
Using projective tests, we uncover hidden aspects of LLM personalities that are not easily accessible through direct questioning.
Our machine learning analysis revealed that LLMs exhibit distinct AInality traits and manifest diverse personality types, demonstrating dynamic shifts in response to external instructions.
arXiv Detail & Related papers (2023-12-21T04:57:21Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using
PsychoBench [83.41621219298489]
We propose a framework, PsychoBench, for evaluating diverse psychological aspects of Large Language Models (LLMs)
PsychoBench classifies these scales into four distinct categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities.
We employ a jailbreak approach to bypass the safety alignment protocols and test the intrinsic natures of LLMs.
arXiv Detail & Related papers (2023-10-02T17:46:09Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.