Related papers: On Efficient Reinforcement Learning for Full-length Game of StarCraft II

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

URL: http://arxiv.org/abs/2209.11553v1
Date: Fri, 23 Sep 2022 12:24:21 GMT
Title: On Efficient Reinforcement Learning for Full-length Game of StarCraft II
Authors: Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu
Abstract summary: We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. We improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively.
Score: 21.768578136029987
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. We investigate a curriculum transfer training procedure and train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively. Our codes are at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable. We then compare our work with mAS using the same resources and show that our method is more effective. The codes of mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.

Related papers

AVA: Attentive VLM Agent for Mastering StarCraft II [56.07921367623274]
We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience. Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay.
arXiv Detail & Related papers (2025-03-07T12:54:25Z)
Reinforcement Learning for High-Level Strategic Control in Tower Defense Games [47.618236610219554]
In strategy games, one of the most important aspects of game design is maintaining a sense of challenge for players. We propose an automated approach that combines traditional scripted methods with reinforcement learning. Results show that combining a learned approach, such as reinforcement learning, with a scripted AI produces a higher-performing and more robust agent than using only AI.
arXiv Detail & Related papers (2024-06-12T08:06:31Z)
DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan. We first put forward an AI program named DanZero for this game. In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z)
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z)
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning [38.75717733273262]
StarCraft II is one of the most challenging simulated reinforcement learning environments. Blizzard has released a massive dataset of millions of StarCraft II games played by human players. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol.
arXiv Detail & Related papers (2023-08-07T12:21:37Z)
DanZero: Mastering GuanDan Game with Reinforcement Learning [121.93690719186412]
Card game AI has always been a hot topic in the research of artificial intelligence. In this paper, we are devoted to developing an AI program for a more complex card game, GuanDan. We propose the first AI program DanZero for GuanDan using reinforcement learning technique.
arXiv Detail & Related papers (2022-10-31T06:29:08Z)
Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition. We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z)
Applying supervised and reinforcement learning methods to create neural-network-based agents for playing StarCraft II [0.0]
We propose a neural network architecture for playing the full two-player match of StarCraft II trained with general-purpose supervised and reinforcement learning. Our implementation achieves a non-trivial performance when compared to the in-game scripted bots.
arXiv Detail & Related papers (2021-09-26T20:08:10Z)
An Introduction of mini-AlphaStar [22.820438931820764]
An SC2 agent called AlphaStar is proposed which shows excellent performance, obtaining a high win-rates of 99.8% against Grandmaster level human players. We implemented a mini-scaled version of it called mini-AlphaStar based on their paper and the pseudocode they provided. The objective of mini-AlphaStar is to provide a reproduction of the original AlphaStar and facilitate the future research of RL on large-scale problems.
arXiv Detail & Related papers (2021-04-14T14:31:51Z)
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II [15.612456049715123]
AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve. We propose a deep reinforcement learning agent, StarCraft Commander ( SCC) SCC demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event.
arXiv Detail & Related papers (2020-12-24T08:43:44Z)
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game [25.248034258354533]
Recently, Google's DeepMind announced AlphaStar, a grandmaster level AI in StarCraft II that can play with humans using comparable action space and operations. In this paper, we introduce a new AI agent, named TStarBot-X, that is trained under orders of less computations and can play competitively with expert human players.
arXiv Detail & Related papers (2020-11-27T13:31:49Z)
Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games. We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.