Related papers: AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

URL: http://arxiv.org/abs/2308.03526v1
Date: Mon, 7 Aug 2023 12:21:37 GMT
Title: AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Authors: Micha\"el Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad \.Zo{\l}na, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio G\'omez Colmenarejo, A\"aron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
Abstract summary: StarCraft II is one of the most challenging simulated reinforcement learning environments. Blizzard has released a massive dataset of millions of StarCraft II games played by human players. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol.
Score: 38.75717733273262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.

Related papers

Playing Non-Embedded Card-Based Games with Reinforcement Learning [18.971623378904503]
We propose a non-embedded offline reinforcement learning training strategy to achieve real-time autonomous gameplay in the RTS game Clash Royale. We extract features using state-of-the-art object detection and optical character recognition models. Our method enables real-time image acquisition, perception feature fusion, decision-making, and control on mobile devices, successfully defeating built-in AI opponents.
arXiv Detail & Related papers (2025-04-07T07:26:02Z)
AVA: Attentive VLM Agent for Mastering StarCraft II [56.07921367623274]
We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience. Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay.
arXiv Detail & Related papers (2025-03-07T12:54:25Z)
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks [59.50879251101105]
We propose Hokoff, a comprehensive set of pre-collected datasets that covers offline RL and offline MARL. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game.
arXiv Detail & Related papers (2024-08-20T05:38:50Z)
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach [7.693497788883165]
Large language model (LLM) agents, such as Voyage and MetaGPT, present the immense potential in solving intricate tasks. We propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information. Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI
arXiv Detail & Related papers (2023-12-19T05:27:16Z)
DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan. We first put forward an AI program named DanZero for this game. In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z)
Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition. We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z)
On Efficient Reinforcement Learning for Full-length Game of StarCraft II [21.768578136029987]
We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. We improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively.
arXiv Detail & Related papers (2022-09-23T12:24:21Z)
Applying supervised and reinforcement learning methods to create neural-network-based agents for playing StarCraft II [0.0]
We propose a neural network architecture for playing the full two-player match of StarCraft II trained with general-purpose supervised and reinforcement learning. Our implementation achieves a non-trivial performance when compared to the in-game scripted bots.
arXiv Detail & Related papers (2021-09-26T20:08:10Z)
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II [15.612456049715123]
AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve. We propose a deep reinforcement learning agent, StarCraft Commander ( SCC) SCC demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event.
arXiv Detail & Related papers (2020-12-24T08:43:44Z)
DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL) Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z)
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game [25.248034258354533]
Recently, Google's DeepMind announced AlphaStar, a grandmaster level AI in StarCraft II that can play with humans using comparable action space and operations. In this paper, we introduce a new AI agent, named TStarBot-X, that is trained under orders of less computations and can play competitively with expert human players.
arXiv Detail & Related papers (2020-11-27T13:31:49Z)
Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games. We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.