AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2308.03526v1
- Date: Mon, 7 Aug 2023 12:21:37 GMT
- Title: AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
- Authors: Micha\"el Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar
Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad
\.Zo{\l}na, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama,
Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah
Henderson, Sergio G\'omez Colmenarejo, A\"aron van den Oord, Wojciech Marian
Czarnecki, Nando de Freitas, Oriol Vinyals
- Abstract summary: StarCraft II is one of the most challenging simulated reinforcement learning environments.
Blizzard has released a massive dataset of millions of StarCraft II games played by human players.
We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol.
- Score: 38.75717733273262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: StarCraft II is one of the most challenging simulated reinforcement learning
environments; it is partially observable, stochastic, multi-agent, and
mastering StarCraft II requires strategic planning over long time horizons with
real-time low-level execution. It also has an active professional competitive
scene. StarCraft II is uniquely suited for advancing offline RL algorithms,
both because of its challenging nature and because Blizzard has released a
massive dataset of millions of StarCraft II games played by human players. This
paper leverages that and establishes a benchmark, called AlphaStar Unplugged,
introducing unprecedented challenges for offline reinforcement learning. We
define a dataset (a subset of Blizzard's release), tools standardizing an API
for machine learning methods, and an evaluation protocol. We also present
baseline agents, including behavior cloning, offline variants of actor-critic
and MuZero. We improve the state of the art of agents using only offline data,
and we achieve 90% win rate against previously published AlphaStar behavior
cloning agent.
Related papers
- Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks [59.50879251101105]
We propose Hokoff, a comprehensive set of pre-collected datasets that covers offline RL and offline MARL.
This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game.
We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game.
arXiv Detail & Related papers (2024-08-20T05:38:50Z) - Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach [7.693497788883165]
Large language model (LLM) agents, such as Voyage and MetaGPT, present the immense potential in solving intricate tasks.
We propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information.
Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI
arXiv Detail & Related papers (2023-12-19T05:27:16Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - On Efficient Reinforcement Learning for Full-length Game of StarCraft II [21.768578136029987]
We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks.
On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI.
We improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively.
arXiv Detail & Related papers (2022-09-23T12:24:21Z) - Applying supervised and reinforcement learning methods to create
neural-network-based agents for playing StarCraft II [0.0]
We propose a neural network architecture for playing the full two-player match of StarCraft II trained with general-purpose supervised and reinforcement learning.
Our implementation achieves a non-trivial performance when compared to the in-game scripted bots.
arXiv Detail & Related papers (2021-09-26T20:08:10Z) - SCC: an efficient deep reinforcement learning agent mastering the game
of StarCraft II [15.612456049715123]
AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve.
We propose a deep reinforcement learning agent, StarCraft Commander ( SCC)
SCC demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event.
arXiv Detail & Related papers (2020-12-24T08:43:44Z) - DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL)
Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z) - TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League
Training in StarCraft II Full Game [25.248034258354533]
Recently, Google's DeepMind announced AlphaStar, a grandmaster level AI in StarCraft II that can play with humans using comparable action space and operations.
In this paper, we introduce a new AI agent, named TStarBot-X, that is trained under orders of less computations and can play competitively with expert human players.
arXiv Detail & Related papers (2020-11-27T13:31:49Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.