Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using
PsychoBench
- URL: http://arxiv.org/abs/2310.01386v2
- Date: Mon, 22 Jan 2024 13:58:50 GMT
- Title: Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using
PsychoBench
- Authors: Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren,
Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu
- Abstract summary: We propose a framework, PsychoBench, for evaluating diverse psychological aspects of Large Language Models (LLMs)
PsychoBench classifies these scales into four distinct categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities.
We employ a jailbreak approach to bypass the safety alignment protocols and test the intrinsic natures of LLMs.
- Score: 83.41621219298489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have recently showcased their remarkable
capacities, not only in natural language processing tasks but also across
diverse domains such as clinical medicine, legal consultation, and education.
LLMs become more than mere applications, evolving into assistants capable of
addressing diverse user requests. This narrows the distinction between human
beings and artificial intelligence agents, raising intriguing questions
regarding the potential manifestation of personalities, temperaments, and
emotions within LLMs. In this paper, we propose a framework, PsychoBench, for
evaluating diverse psychological aspects of LLMs. Comprising thirteen scales
commonly used in clinical psychology, PsychoBench further classifies these
scales into four distinct categories: personality traits, interpersonal
relationships, motivational tests, and emotional abilities. Our study examines
five popular models, namely text-davinci-003, gpt-3.5-turbo, gpt-4, LLaMA-2-7b,
and LLaMA-2-13b. Additionally, we employ a jailbreak approach to bypass the
safety alignment protocols and test the intrinsic natures of LLMs. We have made
PsychoBench openly accessible via https://github.com/CUHK-ARISE/PsychoBench.
Related papers
- Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics [29.325576963215163]
Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents.
We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs.
LLMs exhibit distinct and consistent personality, which is highly influenced by their training data.
arXiv Detail & Related papers (2024-06-20T19:50:56Z) - Illuminating the Black Box: A Psychometric Investigation into the
Multifaceted Nature of Large Language Models [3.692410936160711]
This study explores the idea of AI Personality or AInality suggesting that Large Language Models (LLMs) exhibit patterns similar to human personalities.
Using projective tests, we uncover hidden aspects of LLM personalities that are not easily accessible through direct questioning.
Our machine learning analysis revealed that LLMs exhibit distinct AInality traits and manifest diverse personality types, demonstrating dynamic shifts in response to external instructions.
arXiv Detail & Related papers (2023-12-21T04:57:21Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - Do LLMs Possess a Personality? Making the MBTI Test an Amazing
Evaluation for Large Language Models [2.918940961856197]
We aim to investigate the feasibility of using the Myers-Briggs Type Indicator (MBTI), a widespread human personality assessment tool, as an evaluation metric for large language models (LLMs)
Specifically, experiments will be conducted to explore: 1) the personality types of different LLMs, 2) the possibility of changing the personality types by prompt engineering, and 3) How does the training dataset affect the model's personality.
arXiv Detail & Related papers (2023-07-30T09:34:35Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z) - Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors.
To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors.
MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories.
We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.