Related papers: Applying Psychometrics to Large Language Model Simulated Populations: Recreating the HEXACO Personality Inventory Experiment with Generative Agents

Applying Psychometrics to Large Language Model Simulated Populations: Recreating the HEXACO Personality Inventory Experiment with Generative Agents

URL: http://arxiv.org/abs/2508.00742v1
Date: Fri, 01 Aug 2025 16:16:16 GMT
Title: Applying Psychometrics to Large Language Model Simulated Populations: Recreating the HEXACO Personality Inventory Experiment with Generative Agents
Authors: Sarah Mercer, Daniel P. Martin, Phil Swatton,
Abstract summary: Generative agents demonstrate human-like characteristics through sophisticated natural language interactions.<n>Their ability to assume roles and personalities based on predefined character biographies has positioned them as cost-effective substitutes for human participants in social science research.<n>This paper explores the validity of such persona-based agents in representing human populations.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative agents powered by Large Language Models demonstrate human-like characteristics through sophisticated natural language interactions. Their ability to assume roles and personalities based on predefined character biographies has positioned them as cost-effective substitutes for human participants in social science research. This paper explores the validity of such persona-based agents in representing human populations; we recreate the HEXACO personality inventory experiment by surveying 310 GPT-4 powered agents, conducting factor analysis on their responses, and comparing these results to the original findings presented by Ashton, Lee, & Goldberg in 2004. Our results found 1) a coherent and reliable personality structure was recoverable from the agents' responses demonstrating partial alignment to the HEXACO framework. 2) the derived personality dimensions were consistent and reliable within GPT-4, when coupled with a sufficiently curated population, and 3) cross-model analysis revealed variability in personality profiling, suggesting model-specific biases and limitations. We discuss the practical considerations and challenges encountered during the experiment. This study contributes to the ongoing discourse on the potential benefits and limitations of using generative agents in social science research and provides useful guidance on designing consistent and representative agent personas to maximise coverage and representation of human personality traits.

Related papers

Personality Structured Interview for Large Language Model Simulation in Personality Research [8.208325358490807]
We explore the potential of the theory-informed Personality Structured Interview as a tool for simulating human responses in personality research.<n>We have provided a growing set of 357 structured interview transcripts from a representative sample, each containing an individual's response to 32 open-ended questions.<n>Results from three experiments demonstrate that well-designed structured interviews could improve human-like heterogeneity in LLM-simulated personality data.
arXiv Detail & Related papers (2025-02-17T18:31:57Z)
Generative Agent Simulations of 1,000 People [56.82159813294894]
We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions.
arXiv Detail & Related papers (2024-11-15T11:14:34Z)
Designing LLM-Agents with Personalities: A Psychometric Approach [0.47498241053872914]
This research introduces a novel methodology for assigning quantifiable, controllable and psychometrically validated personalities to Agents. It seeks to overcome the constraints of human subject studies, proposing Agents as an accessible tool for social science inquiry.
arXiv Detail & Related papers (2024-10-25T01:05:04Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
P-React: Synthesizing Topic-Adaptive Reactions of Personality Traits via Mixture of Specialized LoRA Experts [34.374681921626205]
We propose P-React, a mixture of experts (MoE)-based personalized large language models.<n> Particularly, we integrate a Personality Loss (PSL) to better capture individual trait expressions.<n>To facilitate research in this field, we curate OCEAN-Chat, a high-quality, human-verified dataset.
arXiv Detail & Related papers (2024-06-18T12:25:13Z)
Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions [2.6080756513915824]
Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information.
arXiv Detail & Related papers (2024-06-18T02:32:57Z)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z)
AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models [4.6251098692410855]
We propose Agent Framework for Shaping Preference and Personality (AFSPP) AFSPP explores the multifaceted impact of social networks and subjective consciousness on Agents' preference and personality formation. It can significantly enhance the efficiency and scope of psychological experiments.
arXiv Detail & Related papers (2024-01-05T15:52:59Z)
Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs) We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z)
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message. The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response. This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.