Strategic Behavior of Large Language Models: Game Structure vs.
Contextual Framing
- URL: http://arxiv.org/abs/2309.05898v1
- Date: Tue, 12 Sep 2023 00:54:15 GMT
- Title: Strategic Behavior of Large Language Models: Game Structure vs.
Contextual Framing
- Authors: Nunzio Lor\`e, Babak Heydari
- Abstract summary: This paper investigates the strategic decision-making capabilities of three Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2.
Utilizing four canonical two-player games, we explore how these models navigate social dilemmas.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the strategic decision-making capabilities of three
Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2, within the framework
of game theory. Utilizing four canonical two-player games -- Prisoner's
Dilemma, Stag Hunt, Snowdrift, and Prisoner's Delight -- we explore how these
models navigate social dilemmas, situations where players can either cooperate
for a collective benefit or defect for individual gain. Crucially, we extend
our analysis to examine the role of contextual framing, such as diplomatic
relations or casual friendships, in shaping the models' decisions. Our findings
reveal a complex landscape: while GPT-3.5 is highly sensitive to contextual
framing, it shows limited ability to engage in abstract strategic reasoning.
Both GPT-4 and LLaMa-2 adjust their strategies based on game structure and
context, but LLaMa-2 exhibits a more nuanced understanding of the games'
underlying mechanics. These results highlight the current limitations and
varied proficiencies of LLMs in strategic decision-making, cautioning against
their unqualified use in tasks requiring complex strategic reasoning.
Related papers
- Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay [0.0]
We use games like Tic-Tac-Toe, Connect Four, and Battleship to assess strategic thinking and decision-making.
Despite their proficiency on standard benchmarks, GPT-3.5 and GPT-4's abilities to play and reason about fully observable games without pre-training is mediocre.
arXiv Detail & Related papers (2024-07-12T14:17:26Z) - Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games [56.70628673595041]
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored.
We investigate LLMs' behaviour in strategic games, Stag Hunt and Prisoner Dilemma, analyzing performance variations under different settings and prompts.
Our results show that the tested state-of-the-art LLMs exhibit at least one of the following systematic biases: (1) positional bias, (2) payoff bias, or (3) behavioural bias.
arXiv Detail & Related papers (2024-07-05T12:30:02Z) - GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents [4.209869303518743]
We introduce GameBench, a cross-domain benchmark for evaluating strategic reasoning abilities of large language models.
Our evaluations use GPT-3 and GPT-4 in their base form along with two scaffolding frameworks designed to enhance strategic reasoning ability: Chain-of-Thought (CoT) prompting and Reasoning Via Planning (RAP)
Our results show that none of the tested models match human performance, and at worst GPT-4 performs worse than random action.
arXiv Detail & Related papers (2024-06-07T00:28:43Z) - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments [83.78240828340681]
This research investigates Large Language Models' decision-making capabilities through the lens of Game Theory.
We focus specifically on games that support the participation of more than two agents simultaneously.
We introduce our framework, GAMA-Bench, including eight classical multi-agent games.
arXiv Detail & Related papers (2024-03-18T14:04:47Z) - Can Large Language Models do Analytical Reasoning? [45.69642663863077]
This paper explores the cutting-edge Large Language Model with analytical reasoning on sports.
We find that GPT-4 stands out in effectiveness, followed by Claude-2.1, with GPT-3.5, Gemini-Pro, and Llama-2-70b lagging behind.
To our surprise, we observe that most models, including GPT-4, struggle to accurately count the total scores for NBA quarters despite showing strong performance in counting NFL quarter scores.
arXiv Detail & Related papers (2024-03-06T20:22:08Z) - GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications.
This paper evaluates LLMs' reasoning abilities in competitive environments.
We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z) - CivRealm: A Learning and Reasoning Odyssey in Civilization for
Decision-Making Agents [63.79739920174535]
We introduce CivRealm, an environment inspired by the Civilization game.
CivRealm stands as a unique learning and reasoning challenge for decision-making agents.
arXiv Detail & Related papers (2024-01-19T09:14:11Z) - ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic
Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.
Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z) - Strategic Reasoning with Language Models [35.63300060111918]
Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations.
Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining.
This paper introduces an approach that uses pretrained Large Language Models with few-shot chain-of-thought examples to enable strategic reasoning for AI agents.
arXiv Detail & Related papers (2023-05-30T16:09:19Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.