Related papers: Psychometric Comparability of LLM-Based Digital Twins

Psychometric Comparability of LLM-Based Digital Twins

URL: http://arxiv.org/abs/2601.14264v1
Date: Mon, 22 Dec 2025 18:04:27 GMT
Title: Psychometric Comparability of LLM-Based Digital Twins
Authors: Yufei Zhang, Zhihao Ma,
Abstract summary: We benchmark digital twins against human gold standards across models, tasks and testing how person-specific inputs shape performance.<n>Across studies, digital twins achieved high population-level accuracy and strong within-participant profile correlations.<n>Digital twins under-reproduce biases, showing normative prediction, compressed variance and limited sensitivity to temporal information.
Score: 2.7740826124350355
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are used as "digital twins" to replace human respondents, yet their psychometric comparability to humans is uncertain. We propose a construct-validity framework spanning construct representation and the nomological net, benchmarking digital twins against human gold standards across models, tasks and testing how person-specific inputs shape performance. Across studies, digital twins achieved high population-level accuracy and strong within-participant profile correlations, alongside attenuated item-level correlations. In word association tests, LLM-based networks show small-world structure and theory-consistent communities similar to humans, yet diverge lexically and in local structure. In decision-making and contextualized tasks, digital twins under-reproduce heuristic biases, showing normative rationality, compressed variance and limited sensitivity to temporal information. Feature-rich digital twins improve Big Five Personality prediction, but their personality networks show only configural invariance and do not achieve metric invariance. In more applied free-text tasks, feature-rich digital twins better match human narratives, but linguistic differences persist. Together, these results indicate that feature-rich conditioning enhances validity but does not resolve systematic divergences in psychometric comparability. Future work should therefore prioritize delineating the effective boundaries of digital twins, establishing the precise contexts in which they function as reliable proxies for human cognition and behavior.

Related papers

Computational Turing Test Reveals Systematic Differences Between Human and AI Language [0.0]
Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior.<n>Existing validation efforts rely heavily on human-judgment-based evaluations.<n>This paper introduces a computational Turing test to assess how closely LLMs approximate human language.
arXiv Detail & Related papers (2025-11-06T08:56:37Z)
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers [14.983442449498739]
This study investigates whether and how Large Language Models can model the correlational structure of human psychological traits from minimal quantitative inputs.<n>We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales.<n>LLMs demonstrated remarkable accuracy in capturing human psychological structure.
arXiv Detail & Related papers (2025-11-05T06:51:13Z)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation [55.55404595177229]
Large Language Models (LLMs) are exhibiting emergent human-like abilities.<n>TwinVoice is a benchmark for assessing persona simulation across diverse real-world contexts.
arXiv Detail & Related papers (2025-10-29T14:00:42Z)
Assessing the Human-Likeness of LLM-Driven Digital Twins in Simulating Health Care System Trust [11.272529608962996]
Large Language Model (LLM)-driven Human Digital Twins are showing great potential in healthcare system research.<n>However, its actual simulation ability for complex human psychological traits, such as distrust in the healthcare system remains unclear.<n>This study suggests that the current LLM-driven Digital Twins have limitations in modeling complex human attitudes.
arXiv Detail & Related papers (2025-10-27T02:56:22Z)
A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement [3.418816254588274]
Digital representations of individuals ("digital twins") promise to transform social science and decision-making.<n>We conducted 19 studies with a representative U.S. panel and their digital twins.<n>Twins reproduced individual responses with 75% accuracy and seemingly low correlation with human answers.
arXiv Detail & Related papers (2025-09-23T14:42:14Z)
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning [63.25540801694765]
Large Language Models (LLMs) demonstrate striking linguistic abilities, yet whether they achieve this same balance remains unclear.<n>We apply the Information Bottleneck principle to quantitatively compare how LLMs and humans navigate this compression-meaning trade-off.
arXiv Detail & Related papers (2025-05-21T16:29:00Z)
Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications [0.0]
The study compares these findings with insights from an expert survey that included 52 experts.<n>We extracted the main components of Digital Twins using Text Frequency Analysis and N-gram analysis.<n>The analysis of DT components reveal two major groups of DT types: High-Performance Real-Time (HPRT) DTs, and Long-Term Decision Support (LTDS) DTs.
arXiv Detail & Related papers (2024-09-21T09:19:29Z)
Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness. A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results. We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z)
Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts. We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z)
Digital Twins: State of the Art Theory and Practice, Challenges, and Open Research Questions [62.67593386796497]
This work explores the various DT features and current approaches, the shortcomings and reasons behind the delay in the implementation and adoption of digital twin. The major reasons for this delay are the lack of a universal reference framework, domain dependence, security concerns of shared data, reliance of digital twin on other technologies, and lack of quantitative metrics.
arXiv Detail & Related papers (2020-11-02T19:08:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.