Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
- URL: http://arxiv.org/abs/2502.07068v2
- Date: Wed, 19 Feb 2025 15:05:39 GMT
- Title: Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
- Authors: Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul Röttger, Daniel Hershcovich,
- Abstract summary: We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
- Score: 49.908708778200115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale surveys are essential tools for informing social science research and policy, but running surveys is costly and time-intensive. If we could accurately simulate group-level survey results, this would therefore be very valuable to social science research. Prior work has explored the use of large language models (LLMs) for simulating human behaviors, mostly through prompting. In this paper, we are the first to specialize LLMs for the task of simulating survey response distributions. As a testbed, we use country-level results from two global cultural surveys. We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions for a given question. Then, we show that this method substantially outperforms other methods and zero-shot classifiers, even on unseen questions, countries, and a completely unseen survey. While even our best models struggle with the task, especially on unseen questions, our results demonstrate the benefits of specialization for simulation, which may accelerate progress towards sufficiently accurate simulation in the future.
Related papers
- Evaluating Local and Cloud-Based Large Language Models for Simulating Consumer Choices in Energy Stated Preference Surveys [4.672157041593765]
This study investigates the ability of large language models to simulate consumer choices in energy-related SP surveys.
Results indicate that while LLMs achieve an average accuracy of up to 48%, their performance remains insufficient for practical application.
Findings suggest that previous SP choices are the most effective input factor, while longer prompts with varied factor formats may reduce accuracy.
arXiv Detail & Related papers (2025-03-07T10:37:31Z) - Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions [4.020002996724124]
Large language models (LLMs) predict survey responses in advance during the early stages of survey design.
We propose directly fine-tuning LLMs to predict response distributions by leveraging unique structural characteristics of survey data.
We show that fine-tuning on SubPOP greatly improves the match between LLM predictions and human responses across various subpopulations.
arXiv Detail & Related papers (2025-02-24T00:31:33Z) - LLM-Mirror: A Generated-Persona Approach for Survey Pre-Testing [0.0]
We investigate whether providing respondents' prior information can replicate both statistical distributions and individual decision-making patterns.<n>We also introduce the concept of the LLM-Mirror, user personas generated by supplying respondent-specific information to the LLM.<n>Our findings show that: (1) PLS-SEM analysis shows LLM-generated responses align with human responses, (2) LLMs are capable of reproducing individual human responses, and (3) LLM-Mirror responses closely follow human responses at the individual level.
arXiv Detail & Related papers (2024-12-04T09:39:56Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.
Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.
To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation [73.58618024960968]
An increasing number of studies are employing large language models (LLMs) as agents to emulate the sequential decision-making processes of humans.<n>This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions.<n>Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling.
arXiv Detail & Related papers (2024-04-13T16:59:28Z) - BASES: Large-scale Web Search User Simulation with Large Language Model
based Agents [108.97507653131917]
BASES is a novel user simulation framework with large language models (LLMs)
Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors.
WARRIORS is a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions.
arXiv Detail & Related papers (2024-02-27T13:44:09Z) - A step towards the integration of machine learning and small area
estimation [0.0]
We propose a predictor supported by machine learning algorithms which can be used to predict any population or subpopulation characteristics.
We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well.
What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods.
arXiv Detail & Related papers (2024-02-12T09:43:17Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Questioning the Survey Responses of Large Language Models [25.14481433176348]
We critically examine the methodology on the basis of the well-established American Community Survey by the U.S. Census Bureau.<n>We establish two dominant patterns. First, models' responses are governed by ordering and labeling biases, for example, towards survey responses labeled with the letter "A"<n>Second, when adjusting for these systematic biases through randomized answer ordering, models across the board trend towards uniformly random survey responses.
arXiv Detail & Related papers (2023-06-13T17:48:27Z) - How Predictable Are Large Language Model Capabilities? A Case Study on
BIG-bench [52.11481619456093]
We study the performance prediction problem on experiment records from BIG-bench.
An $R2$ score greater than 95% indicates the presence of learnable patterns within the experiment records.
We find a subset as informative as BIG-bench Hard for evaluating new model families, while being $3times$ smaller.
arXiv Detail & Related papers (2023-05-24T09:35:34Z) - Predicting Survey Response with Quotation-based Modeling: A Case Study
on Favorability towards the United States [0.0]
We propose a pioneering approach for predicting survey responses by examining quotations using machine learning.
We leverage a vast corpus of quotations from individuals across different nationalities to extract their level of favorability.
We employ a combination of natural language processing techniques and machine learning algorithms to construct a predictive model for survey responses.
arXiv Detail & Related papers (2023-05-23T14:11:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.