Related papers: Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

URL: http://arxiv.org/abs/2504.04395v1
Date: Sun, 06 Apr 2025 07:35:15 GMT
Title: Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers
Authors: Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, Yuke Zhu,
Abstract summary: Competitive Pok'emon Singles (CPS) is a popular strategy game where players learn to exploit their opponent based on imperfect information.<n>We develop a pipeline to reconstruct the first-person perspective of an agent from logs saved from the third-person perspective of a spectator.<n>This dataset enables a black-box approach where we train large sequence models to adapt to their opponent based solely on their input trajectory.
Score: 24.201490513370523
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Competitive Pok\'emon Singles (CPS) is a popular strategy game where players learn to exploit their opponent based on imperfect information in battles that can last more than one hundred stochastic turns. AI research in CPS has been led by heuristic tree search and online self-play, but the game may also create a platform to study adaptive policies trained offline on large datasets. We develop a pipeline to reconstruct the first-person perspective of an agent from logs saved from the third-person perspective of a spectator, thereby unlocking a dataset of real human battles spanning more than a decade that grows larger every day. This dataset enables a black-box approach where we train large sequence models to adapt to their opponent based solely on their input trajectory while selecting moves without explicit search of any kind. We study a progression from imitation learning to offline RL and offline fine-tuning on self-play data in the hardcore competitive setting of Pok\'emon's four oldest (and most partially observed) game generations. The resulting agents outperform a recent LLM Agent approach and a strong heuristic search engine. While playing anonymously in online battles against humans, our best agents climb to rankings inside the top 10% of active players.

Related papers

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
Unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies.<n>Our method SUPE consistently outperforms prior strategies across a suite of 42 long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z)
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.<n>Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z)
Offline Fictitious Self-Play for Competitive Games [34.445740191223614]
Off-FSP is the first model-free offline RL algorithm for competitive games. This paper introduces Off-FSP, the first practical model-free offline RL algorithm for competitive games.
arXiv Detail & Related papers (2024-02-29T11:36:48Z)
Behavioural Cloning in VizDoom [1.4999444543328293]
This paper describes methods for training autonomous agents to play the game "Doom 2" through Imitation Learning (IL) We also explore how Reinforcement Learning (RL) compares to IL for humanness by comparing camera movement and trajectory data.
arXiv Detail & Related papers (2024-01-08T16:15:43Z)
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning [38.75717733273262]
StarCraft II is one of the most challenging simulated reinforcement learning environments. Blizzard has released a massive dataset of millions of StarCraft II games played by human players. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol.
arXiv Detail & Related papers (2023-08-07T12:21:37Z)
Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning [8.628547849796615]
Asymmetrical multiplayer (AMP) game is a popular game genre which involves multiple types of agents competing or collaborating in the game. It is difficult to train powerful agents that can defeat top human players in AMP games by typical self-play training method because of unbalancing characteristics in their asymmetrical environments. We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game.
arXiv Detail & Related papers (2023-04-20T07:14:32Z)
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity. We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks. Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z)
Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition. We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z)
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning [86.37438204416435]
Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform.
arXiv Detail & Related papers (2022-06-30T15:53:19Z)
Multi-Game Decision Transformers [49.257185338595434]
We show that a single transformer-based model can play a suite of up to 46 Atari games simultaneously at close-to-human performance. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning. We find that our Multi-Game Decision Transformer models offer the best scalability and performance.
arXiv Detail & Related papers (2022-05-30T16:55:38Z)
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble [135.6115462399788]
Deep offline reinforcement learning has made it possible to train strong robotic agents from offline datasets. State-action distribution shift may lead to severe bootstrap error during fine-tuning. We propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples.
arXiv Detail & Related papers (2021-07-01T16:26:54Z)
Counter-Strike Deathmatch with Large-Scale Behavioural Cloning [34.22811814104069]
This paper describes an AI agent that plays the popular first-person-shooter (FPS) video game Counter-Strike; Global Offensive' from pixel input. The agent, a deep neural network, matches the performance of the medium difficulty built-in AI on the deathmatch game mode, whilst adopting a humanlike play style.
arXiv Detail & Related papers (2021-04-09T09:12:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.