Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents
- URL: http://arxiv.org/abs/2602.18462v1
- Date: Fri, 06 Feb 2026 15:13:59 GMT
- Title: Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents
- Authors: Erika Elizabeth Taday Morocho, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, Stefano Cresci,
- Abstract summary: We use a large dataset of U.S. microdata to assess the impact of persona-conditioned simulations.<n>We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance.<n>Our findings highlight a key adverse impact of current persona-based simulation practices.
- Score: 0.4277616907160855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance. Persona effects are highly heterogeneous as most items exhibit minimal change, while a small subset of questions and underrepresented subgroups experience disproportionate distortions. Our findings highlight a key adverse impact of current persona-based simulation practices: demographic conditioning can redistribute error in ways that undermine subgroup fidelity and risk misleading downstream analyses.
Related papers
- Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility [7.616305266104683]
Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science.<n>We test whether LLM-simulated survey respondents can reproduce human patterns of misinformation belief and sharing.
arXiv Detail & Related papers (2026-02-04T15:48:05Z) - Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence? [9.310571879281186]
Large language models (LLMs) can serve as substitutes for human participants in survey and experimental research.<n>LLMs often fail to align with real human behavior, exhibiting limited diversity, systematic misalignment for minority subgroups, insufficient within-group variance, and discrepancies between stated beliefs and actions.<n>This study examines whether fine-tuning on a small subset of human survey data, such as that obtainable from a pilot study, can mitigate these issues and yield realistic simulated outcomes.
arXiv Detail & Related papers (2025-11-26T09:50:42Z) - LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions [60.48458130500911]
We investigate whether emergent misalignment can extend beyond safety behaviors to a broader spectrum of dishonesty and deception under high-stakes scenarios.<n>We finetune open-sourced LLMs on misaligned completions across diverse domains.<n>We find that introducing as little as 1% of misalignment data into a standard downstream task is sufficient to decrease honest behavior over 20%.
arXiv Detail & Related papers (2025-10-09T13:35:19Z) - Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble [46.82793004650415]
Large language models (LLMs) have demonstrated promise in emulating human-like responses across a range of tasks.<n>We propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents.<n>We introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection.
arXiv Detail & Related papers (2025-09-14T15:08:45Z) - Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z) - Prompt Perturbations Reveal Human-Like Biases in Large Language Model Survey Responses [2.3112192919085826]
Large Language Models (LLMs) are increasingly used as proxies for human subjects in social science surveys.<n>Their reliability and susceptibility to known human-like response biases are poorly understood.<n>This work investigates the response robustness of LLMs in normative survey contexts.
arXiv Detail & Related papers (2025-07-09T18:01:50Z) - Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data [4.774576759157642]
Mixed methods research integrates quantitative and qualitative data but faces challenges in aligning their distinct structures.<n>This study investigates whether large language models (LLMs) can reliably predict human survey responses.
arXiv Detail & Related papers (2025-05-28T05:57:26Z) - Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction [5.774786149181393]
We analyze how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs)<n>We find that LLM-generated data fails to replicate the variance observed in real-world human responses.<n>In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data.
arXiv Detail & Related papers (2025-02-22T16:25:33Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.