Can LLMs Assess Personality? Validating Conversational AI for Trait Profiling
- URL: http://arxiv.org/abs/2602.15848v1
- Date: Fri, 23 Jan 2026 15:46:42 GMT
- Title: Can LLMs Assess Personality? Validating Conversational AI for Trait Profiling
- Authors: Andrius Matšenas, Anet Lello, Tõnis Lees, Hans Peep, Kim Lilii Tamm,
- Abstract summary: This study validates Large Language Models (LLMs) as a dynamic alternative to questionnaire-based personality assessment.<n>Using a within-subjects experiment, we compared Big Five personality scores derived from guided LLM conversations against the gold-standard IPIP-50 questionnaire.<n>Results indicate moderate convergent validity, with Conscientiousness, Openness, and Neuroticism scores statistically equivalent between methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study validates Large Language Models (LLMs) as a dynamic alternative to questionnaire-based personality assessment. Using a within-subjects experiment (N=33), we compared Big Five personality scores derived from guided LLM conversations against the gold-standard IPIP-50 questionnaire, while also measuring user-perceived accuracy. Results indicate moderate convergent validity (r=0.38-0.58), with Conscientiousness, Openness, and Neuroticism scores statistically equivalent between methods. Agreeableness and Extraversion showed significant differences, suggesting trait-specific calibration is needed. Notably, participants rated LLM-generated profiles as equally accurate as traditional questionnaire results. These findings suggest conversational AI offers a promising new approach to traditional psychometrics.
Related papers
- Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data [54.145424717168794]
Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored.<n>This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years.<n>We propose the "Individual Turing Test" to evaluate whether acquaintances of the volunteer can correctly identify which response in a multi-candidate pool most plausibly comes from the volunteer.
arXiv Detail & Related papers (2026-03-01T21:46:27Z) - Towards Consistent Detection of Cognitive Distortions: LLM-Based Annotation and Dataset-Agnostic Evaluation [2.699704259580951]
Text-based automated Cognitive Distortion detection is a challenging task due to its subjective nature.<n>We explore the use of Large Language Models (LLMs) as consistent and reliable annotators.
arXiv Detail & Related papers (2025-11-03T11:45:26Z) - Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors [4.814107439144414]
We introduce a novel approach that uncovers latent personality dimensions in large language models (LLMs)
Our experiments show that LLMs "rediscover" core personality traits such as extraversion, agreeableness, conscientiousness, neuroticism, and openness without relying on direct questionnaire inputs.
We can use the derived principal components to assess personality along the Big Five dimensions, and achieve improvements in average personality prediction accuracy of up to 5% over fine-tuned models.
arXiv Detail & Related papers (2024-09-16T00:24:40Z) - Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models [14.04596228819108]
This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues.
Our framework applies role-play and questionnaire-based prompting to condition LLMs on counseling sessions.
Our model achieves a 130.95% improvement, surpassing the state-of-the-art Qwen1.5-110B by 36.94% in personality prediction validity.
arXiv Detail & Related papers (2024-06-25T05:30:55Z) - Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks.
We instruct an LLM to self-evaluate its answers.
We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z) - Challenging the Validity of Personality Tests for Large Language Models [2.9123921488295768]
Large language models (LLMs) behave increasingly human-like in text-based interactions.
LLMs' responses to personality tests systematically deviate from human responses.
arXiv Detail & Related papers (2023-11-09T11:54:01Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.