Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- URL: http://arxiv.org/abs/2403.05020v3
- Date: Thu, 18 Apr 2024 18:55:07 GMT
- Title: Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- Authors: Xuhui Zhou, Zhe Su, Tiwalayo Eisape, Hyunwoo Kim, Maarten Sap,
- Abstract summary: Large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena.
Recent work has used a more omniscient perspective on these simulations, which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world.
- Score: 24.613282867543244
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.
Related papers
- MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions [58.57255822646756]
This paper introduces MathChat, a benchmark designed to evaluate large language models (LLMs) across a broader spectrum of mathematical tasks.
We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios.
We develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations.
arXiv Detail & Related papers (2024-05-29T18:45:55Z) - Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View [21.341128731357415]
Large language models (LLMs) have been shown to face hallucination issues due to the data they trained on often containing human bias.
We propose CogMir, an open-ended Multi-LLM Agents framework that utilizes hallucination properties to assess and enhance LLM Agents' social intelligence.
arXiv Detail & Related papers (2024-05-23T16:13:33Z) - LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities [0.0]
Integrating large language models with agent-based simulations offers a transformational potential for understanding complex social systems.
We explore architectures and methods to systematically develop LLM-augmented social simulations.
We conclude that integrating LLMs with agent-based simulations offers a powerful toolset for researchers and scientists.
arXiv Detail & Related papers (2024-05-08T08:57:54Z) - Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents [101.17919953243107]
GovSim is a generative simulation platform designed to study strategic interactions and cooperative decision-making in large language models (LLMs)
We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%.
We show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability.
arXiv Detail & Related papers (2024-04-25T15:59:16Z) - Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation [43.913403294346686]
We present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query.
We fine-tune the LLM with MATRIX to ensure adherence to human values without compromising inference speed.
Our method outperforms over 10 baselines across 4 benchmarks.
arXiv Detail & Related papers (2024-02-08T14:21:03Z) - Systematic Biases in LLM Simulations of Debates [14.12892960275563]
This study highlights the limitations of Large Language Models (LLMs) in simulating human interactions.
Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases despite being directed to debate from certain political perspectives.
This tendency results in behavioral patterns that seem to deviate from well-established social dynamics among humans.
arXiv Detail & Related papers (2024-02-06T14:51:55Z) - LLM-Based Agent Society Investigation: Collaboration and Confrontation
in Avalon Gameplay [57.202649879872624]
We present a novel framework designed to seamlessly adapt to Avalon gameplay.
The core of our proposed framework is a multi-agent system that enables efficient communication and interaction among agents.
Our results demonstrate the effectiveness of our framework in generating adaptive and intelligent agents.
arXiv Detail & Related papers (2023-10-23T14:35:26Z) - CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.