Related papers: Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing

Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing

URL: http://arxiv.org/abs/2309.05898v1
Date: Tue, 12 Sep 2023 00:54:15 GMT
Title: Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing
Authors: Nunzio Lor\`e, Babak Heydari
Abstract summary: This paper investigates the strategic decision-making capabilities of three Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2. Utilizing four canonical two-player games, we explore how these models navigate social dilemmas.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates the strategic decision-making capabilities of three Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2, within the framework of game theory. Utilizing four canonical two-player games -- Prisoner's Dilemma, Stag Hunt, Snowdrift, and Prisoner's Delight -- we explore how these models navigate social dilemmas, situations where players can either cooperate for a collective benefit or defect for individual gain. Crucially, we extend our analysis to examine the role of contextual framing, such as diplomatic relations or casual friendships, in shaping the models' decisions. Our findings reveal a complex landscape: while GPT-3.5 is highly sensitive to contextual framing, it shows limited ability to engage in abstract strategic reasoning. Both GPT-4 and LLaMa-2 adjust their strategies based on game structure and context, but LLaMa-2 exhibits a more nuanced understanding of the games' underlying mechanics. These results highlight the current limitations and varied proficiencies of LLMs in strategic decision-making, cautioning against their unqualified use in tasks requiring complex strategic reasoning.

Related papers

Playing games with Large language models: Randomness and strategy [15.379345372327375]
Large language models (LLMs) can play games, investigating their capabilities for randomisation and strategic adaptation. We focus on GPT-4o-Mini-2024-08-17 and test two games between LLMs: Rock Paper Scissors (RPS) and games of strategy (Prisoners Dilemma PD) Our research reveals that LLMs appear to develop loss aversion strategies in repeated games, with RPS converging to stalemate conditions while PD shows systematic shifts between cooperative and competitive outcomes based on prompt design.
arXiv Detail & Related papers (2025-03-04T13:04:48Z)
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay [0.0]
We use games like Tic-Tac-Toe, Connect Four, and Battleship to assess strategic thinking and decision-making. Despite their proficiency on standard benchmarks, GPT-3.5 and GPT-4's abilities to play and reason about fully observable games without pre-training is mediocre.
arXiv Detail & Related papers (2024-07-12T14:17:26Z)
Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games [56.70628673595041]
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic decision-making abilities remain largely unexplored. This work investigates the performance and merits of LLMs in canonical game-theoretic two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma. Our structured evaluation of GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B shows that these models, when making decisions in these games, are affected by at least one of the following systematic biases.
arXiv Detail & Related papers (2024-07-05T12:30:02Z)
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents [4.209869303518743]
We introduce GameBench, a cross-domain benchmark for evaluating strategic reasoning abilities of large language models. Our evaluations use GPT-3 and GPT-4 in their base form along with two scaffolding frameworks designed to enhance strategic reasoning ability: Chain-of-Thought (CoT) prompting and Reasoning Via Planning (RAP) Our results show that none of the tested models match human performance, and at worst GPT-4 performs worse than random action.
arXiv Detail & Related papers (2024-06-07T00:28:43Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning [76.3114831562989]
It requires Large Language Model (LLM) agents to adapt their strategies dynamically in multi-agent environments. We propose a novel framework: "K-Level Reasoning with Large Language Models (K-R)"
arXiv Detail & Related papers (2024-02-02T16:07:05Z)
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents [63.79739920174535]
We introduce CivRealm, an environment inspired by the Civilization game. CivRealm stands as a unique learning and reasoning challenge for decision-making agents.
arXiv Detail & Related papers (2024-01-19T09:14:11Z)
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research. Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z)
SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM) In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.