PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red
- URL: http://arxiv.org/abs/2506.23689v1
- Date: Mon, 30 Jun 2025 10:09:13 GMT
- Title: PokéAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red
- Authors: Zihao Liu, Xinhang Sui, Yueran Song, Siwen Wang,
- Abstract summary: We introduce Pok'eAI, the first text-based, multi-agent large language model (LLM) framework designed to autonomously play and progress through Pok'emon Red.<n>Our system consists of three specialized agents-Planning, Execution, and Critique-each with its own memory bank, role, and skill set.
- Score: 4.558478169296784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Pok\'eAI, the first text-based, multi-agent large language model (LLM) framework designed to autonomously play and progress through Pok\'emon Red. Our system consists of three specialized agents-Planning, Execution, and Critique-each with its own memory bank, role, and skill set. The Planning Agent functions as the central brain, generating tasks to progress through the game. These tasks are then delegated to the Execution Agent, which carries them out within the game environment. Upon task completion, the Critique Agent evaluates the outcome to determine whether the objective was successfully achieved. Once verification is complete, control returns to the Planning Agent, forming a closed-loop decision-making system. As a preliminary step, we developed a battle module within the Execution Agent. Our results show that the battle AI achieves an average win rate of 80.8% across 50 wild encounters, only 6% lower than the performance of an experienced human player. Furthermore, we find that a model's battle performance correlates strongly with its LLM Arena score on language-related tasks, indicating a meaningful link between linguistic ability and strategic reasoning. Finally, our analysis of gameplay logs reveals that each LLM exhibits a unique playstyle, suggesting that individual models develop distinct strategic behaviors.
Related papers
- Cultivating Game Sense for Yourself: Making VLMs Gaming Experts [23.370716496046217]
We propose a paradigm shift in gameplay agent design.<n>Instead of directly controlling gameplay, VLM develops specialized execution modules tailored for tasks like shooting and combat.<n>These modules handle real-time game interactions, elevating VLM to a high-level developer.
arXiv Detail & Related papers (2025-03-27T08:40:47Z) - AVA: Attentive VLM Agent for Mastering StarCraft II [56.07921367623274]
We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience.<n>Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay.
arXiv Detail & Related papers (2025-03-07T12:54:25Z) - Hybrid Voting-Based Task Assignment in Role-Playing Games [0.0]
Voting-Based Task Assignment (VBTA) is a framework inspired by human reasoning in task allocation and completion.<n> VBTA efficiently identifies and assigns the most suitable agent to each task.<n>Our method shows promise when generating both unique combat encounters and narratives.
arXiv Detail & Related papers (2025-02-25T22:58:21Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games [21.639516389561837]
We introduce WellPlay, a reasoning dataset for multi-agent conversational inference in Murder Mystery Games (MMGs)<n>WellPlay comprises 1,482 inferential questions across 12 games, spanning objectives, reasoning, and relationship understanding.<n>We present PLAYER*, a novel framework for Large Language Model (LLM)-based agents in MMGs.
arXiv Detail & Related papers (2024-04-26T19:07:30Z) - Deciphering Digital Detectives: Understanding LLM Behaviors and
Capabilities in Multi-Agent Mystery Games [26.07074182316433]
We introduce the first dataset specifically for Jubensha, including character scripts and game rules.
Our work also presents a unique multi-agent interaction framework using LLMs, allowing AI agents to autonomously engage in this game.
To evaluate the gaming performance of these AI agents, we developed novel methods measuring their mastery of case information and reasoning skills.
arXiv Detail & Related papers (2023-12-01T17:33:57Z) - LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay [55.12945794835791]
Using Avalon as a testbed, we employ system prompts to guide LLM agents in gameplay.
We propose a novel framework, tailored for Avalon, features a multi-agent system facilitating efficient communication and interaction.
Results affirm the framework's effectiveness in creating adaptive agents and suggest LLM-based agents' potential in navigating dynamic social interactions.
arXiv Detail & Related papers (2023-10-23T14:35:26Z) - SmartPlay: A Benchmark for LLMs as Intelligent Agents [45.76707302899935]
SmartPlay consists of 6 different games, including Rock-Paper-Scissors, Tower of Hanoi, Minecraft.
Each game challenges a subset of 9 important capabilities of an intelligent LLM agent.
Tests include reasoning with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness.
arXiv Detail & Related papers (2023-10-02T18:52:11Z) - Teamwork under extreme uncertainty: AI for Pokemon ranks 33rd in the
world [0.0]
This paper describes the mechanics of the game and we perform a game analysis.
We propose unique AI algorithms based on our understanding that the two biggest challenges in the game are keeping a balanced team and dealing with three sources of uncertainty.
Our AI agent performed significantly better than all previous attempts and peaked at the 33rd place in the world, in one of the most popular battle formats, while running on only 4 single socket servers.
arXiv Detail & Related papers (2022-12-27T01:52:52Z) - Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent.
We propose a novel episodic memory, LeGEM, for model-free MARL algorithms.
We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2022-05-27T02:21:04Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and
Act in Fantasy Worlds [47.7511759322784]
We seek to create agents that both act and communicate with other agents in pursuit of a goal.
We introduce a reinforcement learning system that incorporates large-scale language modeling-based and commonsense reasoning-based pre-training.
We conduct zero-shot evaluations using held-out human expert demonstrations, showing that our agents are able to act consistently and talk naturally with respect to their motivations.
arXiv Detail & Related papers (2020-10-01T21:06:21Z) - Neural MMO v1.3: A Massively Multiagent Game Environment for Training
and Evaluating Neural Networks [48.5733173329785]
We present Neural MMO, a massively multiagent game environment inspired by MMOs.
We discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
arXiv Detail & Related papers (2020-01-31T18:50:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.