Related papers: Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

URL: http://arxiv.org/abs/2504.08399v1
Date: Fri, 11 Apr 2025 10:03:55 GMT
Title: Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models
Authors: Yin Jou Huang, Rafik Hadfi,
Abstract summary: This paper introduces a novel multi-observer framework for Large language models (LLMs) personality assessment.<n>Instead of relying solely on self-assessments, our approach employs multiple observer agents configured with a specific relationship context.<n>Our experiments reveal that LLMs possess systematic biases in self-report personality ratings.
Score: 2.7010154811483167
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: There is a growing interest in assessing the personality traits of Large language models (LLMs). However, traditional personality assessments based on self-report questionnaires may fail to capture their true behavioral nuances due to inherent biases and meta-knowledge contamination. This paper introduces a novel multi-observer framework for LLM personality assessment that draws inspiration from informant-report methods in psychology. Instead of relying solely on self-assessments, our approach employs multiple observer agents configured with a specific relationship context (e.g., family, friend, or workplace) to simulate interactive scenarios with a subject LLM. These observers engage in dialogues and subsequently provide ratings across the Big Five personality dimensions. Our experiments reveal that LLMs possess systematic biases in self-report personality ratings. Moreover, aggregating observer ratings effectively reduces non-systematic biases and achieves optimal reliability with 5-7 observers. The findings highlight the significant impact of relationship context on personality perception and demonstrate that a multi-observer paradigm yields a more robust and context-sensitive evaluation of LLM personality traits.

Related papers

Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models [70.180385882195]
This paper introduces a personality-aware user simulation for Conversational Recommender Systems (CRSs) The user agent induces customizable personality traits and preferences, while the system agent possesses the persuasion capability to simulate realistic interaction in CRSs. Experimental results demonstrate that state-of-the-art LLMs can effectively generate diverse user responses aligned with specified personality traits.
arXiv Detail & Related papers (2025-04-09T13:21:17Z)
Can LLM Agents Maintain a Persona in Discourse? [3.286711575862228]
Large Language Models (LLMs) are widely used as conversational agents, exploiting their capabilities in various sectors such as education, law, medicine, and more. LLMs are often subjected to context-shifting behaviour, resulting in a lack of consistent and interpretable personality-aligned interactions. We show that while LLMs can be guided toward personality-driven dialogue, their ability to maintain personality traits varies significantly depending on the combination of models and discourse settings.
arXiv Detail & Related papers (2025-02-17T14:36:39Z)
Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires [3.6001840369062386]
This work applies psychological tools to Large Language Models in diverse scenarios to generate personality profiles. Our findings reveal that LLMs exhibit unique traits, varying characteristics, and distinct personality profiles even within the same family of models.
arXiv Detail & Related papers (2025-02-07T16:12:52Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts. Most existing methods learn post features directly by fine-tuning the pre-trained language models. We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z)
Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality [11.660251022962141]
We investigate the validity of conventional personality questionnaires in capturing the human-like personality traits of Large Language Models (LLMs)<n>Our objective is to assess the congruence between the personality traits LLMs claim to possess and their demonstrated tendencies in real-world scenarios.
arXiv Detail & Related papers (2024-02-22T16:32:08Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits [30.770525830385637]
We study the behavior of large language models (LLMs) based on the Big Five personality model. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types. Human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%.
arXiv Detail & Related papers (2023-05-04T04:58:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.