Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- URL: http://arxiv.org/abs/2403.05020v4
- Date: Fri, 04 Oct 2024 02:01:34 GMT
- Title: Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- Authors: Xuhui Zhou, Zhe Su, Tiwalayo Eisape, Hyunwoo Kim, Maarten Sap,
- Abstract summary: Large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena.
Recent work has used a more omniscient perspective on these simulations, which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world.
- Score: 24.613282867543244
- License:
- Abstract: Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.
Related papers
- Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation [51.44040615856536]
This paper analyzes large language models' ability to simulate social media engagement through action guided response generation.
We benchmark GPT-4o-mini, O1-mini, and DeepSeek-R1 in social media engagement simulation regarding a major societal event.
arXiv Detail & Related papers (2025-02-17T17:43:08Z) - Sense and Sensitivity: Evaluating the simulation of social dynamics via Large Language Models [27.313165173789233]
Large language models have been proposed as a powerful replacement for classical agent-based models (ABMs) to simulate social dynamics.
However, due to the black box nature of LLMs, it is unclear whether LLM agents actually execute the intended semantics.
We show that while it is possible to engineer prompts that approximate the intended dynamics, the quality of these simulations is highly sensitive to the particular choice of prompts.
arXiv Detail & Related papers (2024-12-06T14:50:01Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.
Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.
To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue [25.89926022671521]
We generate a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset.
We find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties.
arXiv Detail & Related papers (2024-09-12T18:00:18Z) - Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations.
We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments.
But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z) - MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions [58.57255822646756]
This paper introduces MathChat, a benchmark designed to evaluate large language models (LLMs) across a broader spectrum of mathematical tasks.
We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios.
We develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations.
arXiv Detail & Related papers (2024-05-29T18:45:55Z) - Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs)
We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%.
We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z) - Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation [43.913403294346686]
We present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query.
We fine-tune the LLM with MATRIX to ensure adherence to human values without compromising inference speed.
Our method outperforms over 10 baselines across 4 benchmarks.
arXiv Detail & Related papers (2024-02-08T14:21:03Z) - CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.