Predicting Field Experiments with Large Language Models
- URL: http://arxiv.org/abs/2504.01167v1
- Date: Tue, 01 Apr 2025 20:14:35 GMT
- Title: Predicting Field Experiments with Large Language Models
- Authors: Yaoyu Chen, Yuheng Hu, Yingda Lu,
- Abstract summary: We propose and evaluate an automated framework that produces predictions of field experiment outcomes.<n>Applying this framework to 319 experiments drawn from renowned economics literature yields a notable prediction accuracy of 78%.<n>We attribute this skewness to several factors, including gender differences, ethnicity, and social norms.
- Score: 0.6144680854063939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated unprecedented emergent capabilities, including content generation, translation, and the simulation of human behavior. Field experiments, despite their high cost, are widely employed in economics and the social sciences to study real-world human behavior through carefully designed manipulations and treatments. However, whether and how these models can be utilized to predict outcomes of field experiments remains unclear. In this paper, we propose and evaluate an automated LLM-based framework that produces predictions of field experiment outcomes. Applying this framework to 319 experiments drawn from renowned economics literature yields a notable prediction accuracy of 78%. Interestingly, we find that performance is highly skewed. We attribute this skewness to several factors, including gender differences, ethnicity, and social norms.
Related papers
- Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment.<n>First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population.<n>We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - LLMs Model Non-WEIRD Populations: Experiments with Synthetic Cultural Agents [0.0]
We study economic behavior across diverse, non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations.<n>We use Large Language Models (LLMs) to create synthetic cultural agents (SCAs) representing these populations.<n>Our results demonstrate substantial cross-cultural variability in experimental behavior.
arXiv Detail & Related papers (2025-01-12T15:06:28Z) - Simulating Field Experiments with Large Language Models [0.6144680854063939]
This paper pioneers the utilization of large language models (LLMs) for simulating field experiments.
By introducing two novel prompting strategies, observer and participant modes, we demonstrate the ability of LLMs to both predict outcomes and replicate participant responses within complex field settings.
Our findings indicate a promising alignment with actual experimental results in certain scenarios, achieving a stimulation accuracy of 66% in observer mode.
arXiv Detail & Related papers (2024-08-19T03:41:43Z) - Can Language Models Use Forecasting Strategies? [14.332379032371612]
We describe experiments using a novel dataset of real world events and associated human predictions.
We find that models still struggle to make accurate predictions about the future.
arXiv Detail & Related papers (2024-06-06T19:01:42Z) - A Theory of LLM Sampling: Part Descriptive and Part Prescriptive [53.08398658452411]
Large Language Models (LLMs) are increasingly utilized in autonomous decision-making.
We show that this sampling behavior resembles that of human decision-making.
We show that this deviation of a sample from the statistical norm towards a prescriptive component consistently appears in concepts across diverse real-world domains.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - On the Importance of Application-Grounded Experimental Design for
Evaluating Explainable ML Methods [20.2027063607352]
We present an experimental study extending a prior explainable ML evaluation experiment and bringing the setup closer to the deployment setting.
Our empirical study draws dramatically different conclusions than the prior work, highlighting how seemingly trivial experimental design choices can yield misleading results.
We believe this work holds lessons about the necessity of situating the evaluation of any ML method and choosing appropriate tasks, data, users, and metrics to match the intended deployment contexts.
arXiv Detail & Related papers (2022-06-24T14:46:19Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Bayesian Optimal Experimental Design for Simulator Models of Cognition [14.059933880568908]
We combine recent advances in BOED and approximate inference for intractable models to find optimal experimental designs.
Our simulation experiments on multi-armed bandit tasks show that our method results in improved model discrimination and parameter estimation.
arXiv Detail & Related papers (2021-10-29T09:04:01Z) - Predicting Performance for Natural Language Processing Tasks [128.34208911925424]
We build regression models to predict the evaluation score of an NLP experiment given the experimental settings as input.
Experimenting on 9 different NLP tasks, we find that our predictors can produce meaningful predictions over unseen languages and different modeling architectures.
arXiv Detail & Related papers (2020-05-02T16:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.