Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations
- URL: http://arxiv.org/abs/2403.03407v4
- Date: Thu, 03 Oct 2024 03:51:03 GMT
- Title: Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations
- Authors: Max Lamparth, Anthony Corso, Jacob Ganz, Oriana Skylar Mastro, Jacquelyn Schneider, Harold Trinkunas,
- Abstract summary: We show that large language models (LLMs) behave differently compared to humans in high-stakes military decision-making scenarios.
Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
- Score: 1.6108153271585284
- License:
- Abstract: To some, the advent of artificial intelligence (AI) promises better decision-making and increased military effectiveness while reducing the influence of human error and emotions. However, there is still debate about how AI systems, especially large language models (LLMs) that can be applied to many tasks, behave compared to humans in high-stakes military decision-making scenarios with the potential for increased risks towards escalation. To test this potential and scrutinize the use of LLMs for such purposes, we use a new wargame experiment with 214 national security experts designed to examine crisis escalation in a fictional U.S.-China scenario and compare the behavior of human player teams to LLM-simulated team responses in separate simulations. Here, we find that the LLM-simulated responses can be more aggressive and significantly affected by changes in the scenario. We show a considerable high-level agreement in the LLM and human responses and significant quantitative and qualitative differences in individual actions and strategic tendencies. These differences depend on intrinsic biases in LLMs regarding the appropriate level of violence following strategic instructions, the choice of LLM, and whether the LLMs are tasked to decide for a team of players directly or first to simulate dialog between a team of players. When simulating the dialog, the discussions lack quality and maintain a farcical harmony. The LLM simulations cannot account for human player characteristics, showing no significant difference even for extreme traits, such as "pacifist" or "aggressive sociopath." When probing behavioral consistency across individual moves of the simulation, the tested LLMs deviated from each other but generally showed somewhat consistent behavior. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
Related papers
- Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games [7.504095239018173]
Large Language Model (LLM)-based agents increasingly undertake real-world tasks and engage with human society.
This study investigates how different personas and experimental framings affect these AI agents' altruistic behavior.
Despite being trained on extensive human-generated data, these AI agents cannot accurately predict human decisions.
arXiv Detail & Related papers (2024-10-28T17:47:41Z) - Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game [3.8284679578037246]
We use the language logic game Who is Undercover?'' as an experimental platform to propose the Multi-Perspective Team Tactic (MPTT) framework.
MPTT aims to cultivate LLMs' human-like language expression logic, multi-dimensional thinking, and self-perception in complex scenarios.
Preliminary results show that MPTT, combined with WIU, leverages LLMs' cognitive capabilities to create a decision-making framework that can simulate real society.
arXiv Detail & Related papers (2024-10-20T06:41:31Z) - Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations [12.887834116390358]
We use a metric based on BERTScore to measure response inconsistency quantitatively.
We show that all five tested LMs exhibit levels of inconsistency that indicate semantic differences.
We recommend further consideration be taken before using LMs to inform military decisions.
arXiv Detail & Related papers (2024-10-17T04:12:17Z) - FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas [23.26678104324838]
We introduced FairMindSim, which simulates the moral dilemma through a series of unfair scenarios.
We used LLM agents to simulate human behavior, ensuring alignment across various stages.
Our findings indicate that, behaviorally, GPT-4o exhibits a stronger sense of social justice, while humans display a richer range of emotions.
arXiv Detail & Related papers (2024-10-14T11:39:05Z) - Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? [0.1474723404975345]
We study the cooperative behavior of Llama2 when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility.
We find that Llama2 tends not to initiate defection but it adopts a cautious approach towards cooperation.
In comparison to prior research on human participants, Llama2 exhibits a greater inclination towards cooperative behavior.
arXiv Detail & Related papers (2024-06-19T14:51:14Z) - ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic
Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.
Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z) - Leveraging Word Guessing Games to Assess the Intelligence of Large
Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy''
We develop DEEP to evaluate LLMs' expression and disguising abilities.
We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z) - Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs)
To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities.
We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.