Related papers: Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

URL: http://arxiv.org/abs/2310.01386v2
Date: Mon, 22 Jan 2024 13:58:50 GMT
Title: Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench
Authors: Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu
Abstract summary: We propose a framework, PsychoBench, for evaluating diverse psychological aspects of Large Language Models (LLMs) PsychoBench classifies these scales into four distinct categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities. We employ a jailbreak approach to bypass the safety alignment protocols and test the intrinsic natures of LLMs.
Score: 83.41621219298489
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education. LLMs become more than mere applications, evolving into assistants capable of addressing diverse user requests. This narrows the distinction between human beings and artificial intelligence agents, raising intriguing questions regarding the potential manifestation of personalities, temperaments, and emotions within LLMs. In this paper, we propose a framework, PsychoBench, for evaluating diverse psychological aspects of LLMs. Comprising thirteen scales commonly used in clinical psychology, PsychoBench further classifies these scales into four distinct categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities. Our study examines five popular models, namely text-davinci-003, gpt-3.5-turbo, gpt-4, LLaMA-2-7b, and LLaMA-2-13b. Additionally, we employ a jailbreak approach to bypass the safety alignment protocols and test the intrinsic natures of LLMs. We have made PsychoBench openly accessible via https://github.com/CUHK-ARISE/PsychoBench.

Related papers

MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue [10.680619215137641]
Large language models (LLMs) have the potential to create more human-like interactions, but struggle to capture subtle emotions. We propose the MIND (Multi-agent INner Dialogue), a novel paradigm that provides more immersive psychological healing environments. We conduct extensive human experiments in various real-world healing dimensions, and find that MIND provides a more user-friendly experience than traditional paradigms.
arXiv Detail & Related papers (2025-02-27T08:04:27Z)
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind [25.524355451378593]
ToMATO is a new ToM benchmark formulated as multiple-choice QA over conversations.<n>We capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge.<n>ToMATO consists of 5.4k questions, 753 conversations, and 15 personality trait patterns.
arXiv Detail & Related papers (2025-01-15T14:47:02Z)
Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans? [9.771036970279765]
A revolution in language modelling has led to various novel applications, some of which rely on the emerging "social abilities" of large language models (LLMs) We ask (i) if personality-prompted models behave (i.e. "make" decisions when presented with a social situation) in line with the personality ascribed, and (ii) if their behavior can be finely controlled. We use classic psychological experiments - the Milgram Experiment and the Ultimatum Game - as social interaction testbeds and apply personality prompting to GPT-3.5/4/4o-mini/4o.
arXiv Detail & Related papers (2024-12-21T20:58:19Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics [29.325576963215163]
Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents. We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs. LLMs exhibit distinct and consistent personality, which is highly influenced by their training data.
arXiv Detail & Related papers (2024-06-20T19:50:56Z)
Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models [3.692410936160711]
This study explores the idea of AI Personality or AInality suggesting that Large Language Models (LLMs) exhibit patterns similar to human personalities. Using projective tests, we uncover hidden aspects of LLM personalities that are not easily accessible through direct questioning. Our machine learning analysis revealed that LLMs exhibit distinct AInality traits and manifest diverse personality types, demonstrating dynamic shifts in response to external instructions.
arXiv Detail & Related papers (2023-12-21T04:57:21Z)
PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner. Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology. We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z)
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models [2.918940961856197]
We aim to investigate the feasibility of using the Myers-Briggs Type Indicator (MBTI), a widespread human personality assessment tool, as an evaluation metric for large language models (LLMs) Specifically, experiments will be conducted to explore: 1) the personality types of different LLMs, 2) the possibility of changing the personality types by prompt engineering, and 3) How does the training dataset affect the model's personality.
arXiv Detail & Related papers (2023-07-30T09:34:35Z)
Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored. This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
Evaluating and Inducing Personality in Pre-trained Language Models [78.19379997967191]
We draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors. To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors. MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories. We devise a Personality Prompting (P2) method to induce LLMs with specific personalities in a controllable way.
arXiv Detail & Related papers (2022-05-20T07:32:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.