Sense and Sensitivity: Evaluating the simulation of social dynamics via Large Language Models
- URL: http://arxiv.org/abs/2412.05093v1
- Date: Fri, 06 Dec 2024 14:50:01 GMT
- Title: Sense and Sensitivity: Evaluating the simulation of social dynamics via Large Language Models
- Authors: Da Ju, Adina Williams, Brian Karrer, Maximilian Nickel,
- Abstract summary: Large language models have been proposed as a powerful replacement for classical agent-based models (ABMs) to simulate social dynamics.
However, due to the black box nature of LLMs, it is unclear whether LLM agents actually execute the intended semantics.
We show that while it is possible to engineer prompts that approximate the intended dynamics, the quality of these simulations is highly sensitive to the particular choice of prompts.
- Score: 27.313165173789233
- License:
- Abstract: Large language models have increasingly been proposed as a powerful replacement for classical agent-based models (ABMs) to simulate social dynamics. By using LLMs as a proxy for human behavior, the hope of this new approach is to be able to simulate significantly more complex dynamics than with classical ABMs and gain new insights in fields such as social science, political science, and economics. However, due to the black box nature of LLMs, it is unclear whether LLM agents actually execute the intended semantics that are encoded in their natural language instructions and, if the resulting dynamics of interactions are meaningful. To study this question, we propose a new evaluation framework that grounds LLM simulations within the dynamics of established reference models of social science. By treating LLMs as a black-box function, we evaluate their input-output behavior relative to this reference model, which allows us to evaluate detailed aspects of their behavior. Our results show that, while it is possible to engineer prompts that approximate the intended dynamics, the quality of these simulations is highly sensitive to the particular choice of prompts. Importantly, simulations are even sensitive to arbitrary variations such as minor wording changes and whitespace. This puts into question the usefulness of current versions of LLMs for meaningful simulations, as without a reference model, it is impossible to determine a priori what impact seemingly meaningless changes in prompt will have on the simulation.
Related papers
- GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator [55.02281855589641]
GauSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.
We leverage continuum mechanics, modeling each kernel as a continuous piece of matter to account for realistic deformations without idealized assumptions.
GauSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.
Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.
To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments [1.4999444543328293]
Simulating learner actions helps stress-test open-ended interactive learning environments and prototype new adaptations before deployment.
We propose Hyp-Mix, a simulation authoring framework that allows experts to develop and evaluate simulations by combining testable hypotheses about learner behavior.
arXiv Detail & Related papers (2024-10-03T00:25:40Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - LLM-driven Imitation of Subrational Behavior : Illusion or Reality? [3.2365468114603937]
Existing work highlights the ability of Large Language Models to address complex reasoning tasks and mimic human communication.
We propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies.
We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios.
arXiv Detail & Related papers (2024-02-13T19:46:39Z) - Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions.
Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases.
These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Simulating Opinion Dynamics with Networks of LLM-based Agents [7.697132934635411]
We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs)
Our findings reveal a strong inherent bias in LLM agents towards producing accurate information, leading simulated agents to consensus in line with scientific reality.
After inducing confirmation bias through prompt engineering, however, we observed opinion fragmentation in line with existing agent-based modeling and opinion dynamics research.
arXiv Detail & Related papers (2023-11-16T07:01:48Z) - CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z) - Likelihood-Free Inference in State-Space Models with Unknown Dynamics [71.94716503075645]
We introduce a method for inferring and predicting latent states in state-space models where observations can only be simulated, and transition dynamics are unknown.
We propose a way of doing likelihood-free inference (LFI) of states and state prediction with a limited number of simulations.
arXiv Detail & Related papers (2021-11-02T12:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.