Related papers: Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction

Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction

URL: http://arxiv.org/abs/2502.16280v1
Date: Sat, 22 Feb 2025 16:25:33 GMT
Title: Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Authors: Sarah Ball, Simeon Allmendinger, Frauke Kreuter, Niklas Kühl,
Abstract summary: We analyze how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs)<n>We find that LLM-generated data fails to replicate the variance observed in real-world human responses.<n>In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data.
Score: 5.774786149181393
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative AI (GenAI) is increasingly used in survey contexts to simulate human preferences. While many research endeavors evaluate the quality of synthetic GenAI data by comparing model-generated responses to gold-standard survey results, fundamental questions about the validity and reliability of using LLMs as substitutes for human respondents remain. Our study provides a technical analysis of how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs) and evaluates their suitability for survey-based predictions. Using 14 different models, we find that LLM-generated data fails to replicate the variance observed in real-world human responses, particularly across demographic subgroups. In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data. Moreover, we show that prompt sensitivity can significantly alter outputs for some models, further undermining the stability and predictiveness of LLM-based simulations. As a key contribution, we adapt a probe-based methodology that reveals how LLMs encode political affiliations in their latent space, exposing the systematic distortions introduced by these models. Our findings highlight critical limitations in AI-generated survey data, urging caution in its use for public opinion research, social science experimentation, and computational behavioral modeling.

Related papers

Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data [54.145424717168794]
Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored.<n>This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years.<n>We propose the "Individual Turing Test" to evaluate whether acquaintances of the volunteer can correctly identify which response in a multi-candidate pool most plausibly comes from the volunteer.
arXiv Detail & Related papers (2026-03-01T21:46:27Z)
Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility [7.616305266104683]
Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science.<n>We test whether LLM-simulated survey respondents can reproduce human patterns of misinformation belief and sharing.
arXiv Detail & Related papers (2026-02-04T15:48:05Z)
In Silico Development of Psychometric Scales: Feasibility of Representative Population Data Simulation with LLMs [0.5833117322405447]
Development of psychometric scales requires large samples, multiple testing phases, and substantial resources.<n>Recent advances in Large Language Models (LLMs) enable the generation of synthetic participant data.<n>We tested whether LLM-simulated datasets can reproduce the latent structures and measurement properties of human responses.
arXiv Detail & Related papers (2025-12-02T16:26:17Z)
Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble [46.82793004650415]
Large language models (LLMs) have demonstrated promise in emulating human-like responses across a range of tasks.<n>We propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents.<n>We introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection.
arXiv Detail & Related papers (2025-09-14T15:08:45Z)
Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data [4.774576759157642]
Mixed methods research integrates quantitative and qualitative data but faces challenges in aligning their distinct structures.<n>This study investigates whether large language models (LLMs) can reliably predict human survey responses.
arXiv Detail & Related papers (2025-05-28T05:57:26Z)
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z)
Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation [5.4260854377259315]
We introduce a series of quality checks to support early-stage deliberation about the viability of using large language models for simulating human opinions. We demonstrate the usefulness of the proposed quality control tests in the context of AI-assisted content moderation. We conclude by discussing implications of these failure modes and recommend how organizations can utilize our proposed tests for prompt engineering.
arXiv Detail & Related papers (2025-04-11T20:16:02Z)
Simulating and Analysing Human Survey Responses with Large Language Models: A Case Study in Energy Stated Preference [4.672157041593765]
Stated preference (SP) surveys help researchers understand how individuals make trade-offs in hypothetical, potentially futuristic, scenarios.<n>This study investigates large language models (LLMs) for simulating consumer choices in energy-related SP surveys.
arXiv Detail & Related papers (2025-03-07T10:37:31Z)
Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs? [1.7819574476785418]
This study explores the potential of Large Language Models (LLMs) to generate artificial surveys.<n>By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods.<n>A novel approach incorporating "Personas" is introduced and compared to five other synthetic survey methods.
arXiv Detail & Related papers (2025-01-20T15:11:03Z)
Large Language Models for Market Research: A Data-augmentation Approach [3.3199591445531453]
Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks.<n>Recent studies highlight a significant gap between LLM-generated and human data, with biases introduced when substituting between the two.<n>We propose a novel statistical data augmentation approach that efficiently integrates LLM-generated data with real data in conjoint analysis.
arXiv Detail & Related papers (2024-12-26T22:06:29Z)
Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations. We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z)
Chatting Up Attachment: Using LLMs to Predict Adult Bonds [0.0]
We use GPT-4 and Claude 3 Opus to create agents that simulate adults with varying profiles, childhood memories, and attachment styles. We evaluate our models using a transcript dataset from 9 humans who underwent the same interview protocol, analyzed and labeled by mental health professionals. Our findings indicate that training the models using only synthetic data achieves performance comparable to training the models on human data.
arXiv Detail & Related papers (2024-08-31T04:29:19Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z)
Causal Inference via Nonlinear Variable Decorrelation for Healthcare Applications [60.26261850082012]
We introduce a novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding. We employ association rules as new representations using association rule mining based on the original features to increase model interpretability.
arXiv Detail & Related papers (2022-09-29T17:44:14Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
A comprehensive comparative evaluation and analysis of Distributional Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT. The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous. We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.