A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models
- URL: http://arxiv.org/abs/2508.01623v1
- Date: Sun, 03 Aug 2025 07:27:36 GMT
- Title: A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models
- Authors: Tadisetty Sai Yashwanth, Dhatri C,
- Abstract summary: This research presents LLM Pokemon League, a competitive tournament system that leverages Large Language Models (LLMs) as intelligent agents to simulate strategic decision-making in Pok'emon battles.<n>The platform is designed to analyze and compare the reasoning, adaptability, and tactical depth exhibited by different LLMs in a type-based, turn-based combat environment.<n>The project enables rich exploration into comparative AI behavior, battle psychology, and meta-strategy development in constrained, rule-based game environments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research presents LLM Pokemon League, a competitive tournament system that leverages Large Language Models (LLMs) as intelligent agents to simulate strategic decision-making in Pok\'emon battles. The platform is designed to analyze and compare the reasoning, adaptability, and tactical depth exhibited by different LLMs in a type-based, turn-based combat environment. By structuring the competition as a single-elimination tournament involving diverse AI trainers, the system captures detailed decision logs, including team-building rationale, action selection strategies, and switching decisions. The project enables rich exploration into comparative AI behavior, battle psychology, and meta-strategy development in constrained, rule-based game environments. Through this system, we investigate how modern LLMs understand, adapt, and optimize decisions under uncertainty, making Pok\'emon League a novel benchmark for AI research in strategic reasoning and competitive learning.
Related papers
- Who is a Better Player: LLM against LLM [53.46608216197315]
We propose an adversarial benchmarking framework to assess the comprehensive performance of Large Language Models (LLMs) through board games competition.<n>We introduce Qi Town, a specialized evaluation platform that supports 5 widely played games and involves 20 LLM-driven players.
arXiv Detail & Related papers (2025-08-05T06:41:47Z) - Can LLMs Play Ô Ăn Quan Game? A Study of Multi-Step Planning and Decision Making [3.827471128756051]
We explore the ability of large language models (LLMs) to plan and make decisions through the lens of the traditional Vietnamese board game, O uAn Quan.<n>Specifically, we develop various agent personas, ranging from aggressive to defensive, and employ the O uAn Quan game as a testbed for assessing LLM performance across different strategies.
arXiv Detail & Related papers (2025-07-04T16:50:40Z) - Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [32.657454056329875]
Large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making.<n>We introduce STRATEGIST, a novel approach that integrates the strengths of both methods.<n>We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information.
arXiv Detail & Related papers (2024-08-20T08:22:04Z) - GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications.
This paper evaluates LLMs' reasoning abilities in competitive environments.
We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z) - ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic
Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.
Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z) - Leveraging Word Guessing Games to Assess the Intelligence of Large
Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy''
We develop DEEP to evaluate LLMs' expression and disguising abilities.
We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z) - All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - Strategic Reasoning with Language Models [35.63300060111918]
Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations.
Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining.
This paper introduces an approach that uses pretrained Large Language Models with few-shot chain-of-thought examples to enable strategic reasoning for AI agents.
arXiv Detail & Related papers (2023-05-30T16:09:19Z) - L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling.
L2E acquires the ability to exploit opponents by a few interactions with different opponents during training.
We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.