Related papers: Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

URL: http://arxiv.org/abs/2602.17108v1
Date: Thu, 19 Feb 2026 06:08:33 GMT
Title: Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests
Authors: Anton Dzega, Aviad Elyashar, Ortal Slobodin, Odeya Cohen, Rami Puzis,
Abstract summary: This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities.<n>Evaluators demonstrated an excellent ability to understand and analyze TAT responses.
Score: 5.119837168333715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of personality. This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities, using the Social Cognition and Object Relations Scale - Global (SCORS-G). LMMs are employed in two distinct roles: as subject models (SMs), which generate stories in response to TAT images, and as evaluator models (EMs), who assess these narratives using the SCORS-G framework. Evaluators demonstrated an excellent ability to understand and analyze TAT responses. Their interpretations are highly consistent with those of human experts. Assessment results highlight that all models understand interpersonal dynamics very well and have a good grasp of the concept of self. However, they consistently fail to perceive and regulate aggression. Performance varied systematically across model families, with larger and more recent models consistently outperforming smaller and earlier ones across SCORS-G dimensions.

Related papers

HUMANLLM: Benchmarking and Reinforcing LLM Anthropomorphism via Human Cognitive Patterns [59.17423586203706]
We present HUMANLLM, a framework treating psychological patterns as interacting causal forces.<n>We construct 244 patterns from 12,000 academic papers and synthesize 11,359 scenarios where 2-5 patterns reinforce, conflict, or modulate each other.<n>Our dual-level checklists evaluate both individual pattern fidelity and emergent multi-pattern dynamics, achieving strong human alignment.
arXiv Detail & Related papers (2026-01-15T08:56:53Z)
Structured Personality Control and Adaptation for LLM Agents [11.050618253938126]
Large Language Models (LLMs) are increasingly shaping human-computer interaction (HCI)<n>We present a framework that models LLM personality via Jungian psychological types.<n>This design allows the agent to maintain nuanced traits while dynamically adjusting to interaction demands.
arXiv Detail & Related papers (2026-01-15T03:15:24Z)
From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration [18.359999860873426]
The House-Tree-Person drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology.<n>It has long faced challenges such as heterogeneous scoring standards, reliance on examiners subjective experience, and a lack of a unified quantitative coding system.<n>The proposed multi-agent framework, by dividing roles, decouples feature recognition from psychological inference and offers a new paradigm for digital mental-health services.
arXiv Detail & Related papers (2025-12-23T09:26:23Z)
Exploring a Gamified Personality Assessment Method through Interaction with LLM Agents Embodying Different Personalities [45.56431615835303]
This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation.<n>We propose a framework of Gamified Personality Assessment through Multi-Personality Representations (Multi-PR GPA)
arXiv Detail & Related papers (2025-07-05T11:17:20Z)
Measuring How LLMs Internalize Human Psychological Concepts: A preliminary analysis [0.0]
We develop a framework to assess concept alignment between Large Language Models and human psychological dimensions.<n>A GPT-4 model achieved superior classification accuracy (66.2%), significantly outperforming GPT-3.5 (55.9%) and BERT (48.1%)<n>Our findings demonstrate that modern LLMs can approximate human psychological constructs with measurable accuracy.
arXiv Detail & Related papers (2025-06-29T01:56:56Z)
Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models [2.7010154811483167]
This paper proposes a novel multi-observer framework for personality trait assessments in LLM agents.<n>Instead of relying on self-assessments, we employ multiple observer agents.<n>We show that these observer-report ratings align more closely with human judgments than traditional self-assessments.
arXiv Detail & Related papers (2025-04-11T10:03:55Z)
Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models [70.180385882195]
This paper introduces a personality-aware user simulation for Conversational Recommender Systems (CRSs)<n>The user agent induces customizable personality traits and preferences, while the system agent possesses the persuasion capability to simulate realistic interaction in CRSs.<n> Experimental results demonstrate that state-of-the-art LLMs can effectively generate diverse user responses aligned with specified personality traits.
arXiv Detail & Related papers (2025-04-09T13:21:17Z)
Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs [65.93003087656754]
VisFactor is a benchmark that digitizes 20 vision-centric subtests from a well-established cognitive psychology assessment.<n>We evaluate 20 frontier Multimodal Large Language Models (MLLMs) from GPT, Gemini, Claude, LLaMA, Qwen, and SEED families.<n>The best-performing model achieves a score of only 25.19 out of 100, with consistent failures on tasks such as mental rotation, spatial relation inference, and figure-ground discrimination.
arXiv Detail & Related papers (2025-02-23T04:21:32Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions. CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.