Related papers: Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

URL: http://arxiv.org/abs/2602.18462v1
Date: Fri, 06 Feb 2026 15:13:59 GMT
Title: Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents
Authors: Erika Elizabeth Taday Morocho, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, Stefano Cresci,
Abstract summary: We use a large dataset of U.S. microdata to assess the impact of persona-conditioned simulations.<n>We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance.<n>Our findings highlight a key adverse impact of current persona-based simulation practices.
Score: 0.4277616907160855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance. Persona effects are highly heterogeneous as most items exhibit minimal change, while a small subset of questions and underrepresented subgroups experience disproportionate distortions. Our findings highlight a key adverse impact of current persona-based simulation practices: demographic conditioning can redistribute error in ways that undermine subgroup fidelity and risk misleading downstream analyses.

Related papers

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility [7.616305266104683]
Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science.<n>We test whether LLM-simulated survey respondents can reproduce human patterns of misinformation belief and sharing.
arXiv Detail & Related papers (2026-02-04T15:48:05Z)
Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence? [9.310571879281186]
Large language models (LLMs) can serve as substitutes for human participants in survey and experimental research.<n>LLMs often fail to align with real human behavior, exhibiting limited diversity, systematic misalignment for minority subgroups, insufficient within-group variance, and discrepancies between stated beliefs and actions.<n>This study examines whether fine-tuning on a small subset of human survey data, such as that obtainable from a pilot study, can mitigate these issues and yield realistic simulated outcomes.
arXiv Detail & Related papers (2025-11-26T09:50:42Z)
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions [60.48458130500911]
We investigate whether emergent misalignment can extend beyond safety behaviors to a broader spectrum of dishonesty and deception under high-stakes scenarios.<n>We finetune open-sourced LLMs on misaligned completions across diverse domains.<n>We find that introducing as little as 1% of misalignment data into a standard downstream task is sufficient to decrease honest behavior over 20%.
arXiv Detail & Related papers (2025-10-09T13:35:19Z)
Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble [46.82793004650415]
Large language models (LLMs) have demonstrated promise in emulating human-like responses across a range of tasks.<n>We propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents.<n>We introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection.
arXiv Detail & Related papers (2025-09-14T15:08:45Z)
Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z)
Prompt Perturbations Reveal Human-Like Biases in Large Language Model Survey Responses [2.3112192919085826]
Large Language Models (LLMs) are increasingly used as proxies for human subjects in social science surveys.<n>Their reliability and susceptibility to known human-like response biases are poorly understood.<n>This work investigates the response robustness of LLMs in normative survey contexts.
arXiv Detail & Related papers (2025-07-09T18:01:50Z)
Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data [4.774576759157642]
Mixed methods research integrates quantitative and qualitative data but faces challenges in aligning their distinct structures.<n>This study investigates whether large language models (LLMs) can reliably predict human survey responses.
arXiv Detail & Related papers (2025-05-28T05:57:26Z)
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction [5.774786149181393]
We analyze how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs)<n>We find that LLM-generated data fails to replicate the variance observed in real-world human responses.<n>In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data.
arXiv Detail & Related papers (2025-02-22T16:25:33Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets. Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data. We formalize the relevant causal structure of problems such as dynamic personalized pricing. We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.