Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
- URL: http://arxiv.org/abs/2502.07068v2
- Date: Wed, 19 Feb 2025 15:05:39 GMT
- Title: Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
- Authors: Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul Röttger, Daniel Hershcovich,
- Abstract summary: We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
- Score: 49.908708778200115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale surveys are essential tools for informing social science research and policy, but running surveys is costly and time-intensive. If we could accurately simulate group-level survey results, this would therefore be very valuable to social science research. Prior work has explored the use of large language models (LLMs) for simulating human behaviors, mostly through prompting. In this paper, we are the first to specialize LLMs for the task of simulating survey response distributions. As a testbed, we use country-level results from two global cultural surveys. We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions for a given question. Then, we show that this method substantially outperforms other methods and zero-shot classifiers, even on unseen questions, countries, and a completely unseen survey. While even our best models struggle with the task, especially on unseen questions, our results demonstrate the benefits of specialization for simulation, which may accelerate progress towards sufficiently accurate simulation in the future.
Related papers
- Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data [54.145424717168794]
Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored.<n>This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years.<n>We propose the "Individual Turing Test" to evaluate whether acquaintances of the volunteer can correctly identify which response in a multi-candidate pool most plausibly comes from the volunteer.
arXiv Detail & Related papers (2026-03-01T21:46:27Z) - SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors [58.87134689752605]
We introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation.<n>We show that even the best LLMs today have limited simulation ability (score: 40.80/100), performance scales log-linearly with model size.<n>We demonstrate that simulation ability correlates most strongly with deep, knowledge-intensive reasoning.
arXiv Detail & Related papers (2025-10-20T13:14:38Z) - Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models [34.9734826462006]
This paper systematically investigates the impact that various Survey Response Generation Methods have on predicted survey responses.<n>We find significant differences between the Survey Response Generation Methods in both individual-level and subpopulation-level alignment.<n>Our results show that Restricted Generation Methods perform best overall, and that reasoning output does not consistently improve alignment.
arXiv Detail & Related papers (2025-10-13T16:29:19Z) - Prompts to Proxies: Emulating Human Preferences via a Compact LLM Ensemble [46.82793004650415]
Large language models (LLMs) have demonstrated promise in emulating human-like responses across a range of tasks.<n>We propose a novel alignment framework that treats LLMs as agent proxies for human survey respondents.<n>We introduce P2P, a system that steers LLM agents toward representative behavioral patterns using structured prompt engineering, entropy-based sampling, and regression-based selection.
arXiv Detail & Related papers (2025-09-14T15:08:45Z) - Population-Aligned Persona Generation for LLM-based Social Simulation [58.84363795421489]
We propose a systematic framework for synthesizing high-quality, population-aligned persona sets for social simulation.<n>Our approach begins by leveraging large language models to generate narrative personas from long-term social media data.<n>To address the needs of specific simulation contexts, we introduce a task-specific module that adapts the globally aligned persona set to targeted subpopulations.
arXiv Detail & Related papers (2025-09-12T10:43:47Z) - Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation [18.225151370273093]
This paper explores a new paradigm: simulating virtual survey respondents using Large Language Models (LLMs)<n>We introduce two novel simulation settings, namely Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS)<n>We curate a comprehensive benchmark suite, LLM-S3 (Large Language Model-based Sociodemographic Simulation Survey), that spans 11 real-world public datasets across four sociological domains.
arXiv Detail & Related papers (2025-09-08T04:59:00Z) - Evaluating Local and Cloud-Based Large Language Models for Simulating Consumer Choices in Energy Stated Preference Surveys [4.672157041593765]
This study investigates the ability of large language models to simulate consumer choices in energy-related SP surveys.
Results indicate that while LLMs achieve an average accuracy of up to 48%, their performance remains insufficient for practical application.
Findings suggest that previous SP choices are the most effective input factor, while longer prompts with varied factor formats may reduce accuracy.
arXiv Detail & Related papers (2025-03-07T10:37:31Z) - Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions [4.020002996724124]
Large language models (LLMs) predict survey responses in advance during the early stages of survey design.
We propose directly fine-tuning LLMs to predict response distributions by leveraging unique structural characteristics of survey data.
We show that fine-tuning on SubPOP greatly improves the match between LLM predictions and human responses across various subpopulations.
arXiv Detail & Related papers (2025-02-24T00:31:33Z) - LLM-Mirror: A Generated-Persona Approach for Survey Pre-Testing [0.0]
We investigate whether providing respondents' prior information can replicate both statistical distributions and individual decision-making patterns.<n>We also introduce the concept of the LLM-Mirror, user personas generated by supplying respondent-specific information to the LLM.<n>Our findings show that: (1) PLS-SEM analysis shows LLM-generated responses align with human responses, (2) LLMs are capable of reproducing individual human responses, and (3) LLM-Mirror responses closely follow human responses at the individual level.
arXiv Detail & Related papers (2024-12-04T09:39:56Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.
Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.
To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys [1.5727456947901746]
We conducted millions of simulations in which large language models (LLMs) were asked to answer subjective questions.
A comparison of different LLM responses with the European Social Survey (ESS) data suggests that the effect of prompts on bias and variability is fundamental.
arXiv Detail & Related papers (2024-05-29T17:54:22Z) - Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation [73.58618024960968]
An increasing number of studies are employing large language models (LLMs) as agents to emulate the sequential decision-making processes of humans.<n>This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions.<n>Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling.
arXiv Detail & Related papers (2024-04-13T16:59:28Z) - BASES: Large-scale Web Search User Simulation with Large Language Model
based Agents [108.97507653131917]
BASES is a novel user simulation framework with large language models (LLMs)
Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors.
WARRIORS is a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions.
arXiv Detail & Related papers (2024-02-27T13:44:09Z) - A step towards the integration of machine learning and small area
estimation [0.0]
We propose a predictor supported by machine learning algorithms which can be used to predict any population or subpopulation characteristics.
We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well.
What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods.
arXiv Detail & Related papers (2024-02-12T09:43:17Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Questioning the Survey Responses of Large Language Models [25.14481433176348]
We critically examine the methodology on the basis of the well-established American Community Survey by the U.S. Census Bureau.<n>We establish two dominant patterns. First, models' responses are governed by ordering and labeling biases, for example, towards survey responses labeled with the letter "A"<n>Second, when adjusting for these systematic biases through randomized answer ordering, models across the board trend towards uniformly random survey responses.
arXiv Detail & Related papers (2023-06-13T17:48:27Z) - How Predictable Are Large Language Model Capabilities? A Case Study on
BIG-bench [52.11481619456093]
We study the performance prediction problem on experiment records from BIG-bench.
An $R2$ score greater than 95% indicates the presence of learnable patterns within the experiment records.
We find a subset as informative as BIG-bench Hard for evaluating new model families, while being $3times$ smaller.
arXiv Detail & Related papers (2023-05-24T09:35:34Z) - Predicting Survey Response with Quotation-based Modeling: A Case Study
on Favorability towards the United States [0.0]
We propose a pioneering approach for predicting survey responses by examining quotations using machine learning.
We leverage a vast corpus of quotations from individuals across different nationalities to extract their level of favorability.
We employ a combination of natural language processing techniques and machine learning algorithms to construct a predictive model for survey responses.
arXiv Detail & Related papers (2023-05-23T14:11:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.