Warm-Start AlphaZero Self-Play Search Enhancements
- URL: http://arxiv.org/abs/2004.12357v1
- Date: Sun, 26 Apr 2020 11:48:53 GMT
- Title: Warm-Start AlphaZero Self-Play Search Enhancements
- Authors: Hui Wang, Mike Preuss, Aske Plaat
- Abstract summary: Recently, AlphaZero has achieved landmark results in deep reinforcement learning.
We propose a novel approach to deal with this cold-start problem by employing simple search enhancements.
Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games.
- Score: 5.096685900776467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, AlphaZero has achieved landmark results in deep reinforcement
learning, by providing a single self-play architecture that learned three
different games at super human level. AlphaZero is a large and complicated
system with many parameters, and success requires much compute power and
fine-tuning. Reproducing results in other games is a challenge, and many
researchers are looking for ways to improve results while reducing
computational demands. AlphaZero's design is purely based on self-play and
makes no use of labeled expert data ordomain specific enhancements; it is
designed to learn from scratch. We propose a novel approach to deal with this
cold-start problem by employing simple search enhancements at the beginning
phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE)
and dynamically weighted combinations of these with the neural network, and
Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that
most of these enhancements improve the performance of their baseline player in
three different (small) board games, with especially RAVE based variants
playing strongly.
Related papers
- Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep
Reinforcement Learning [0.0]
We present Lucy-SKG, a Reinforcement Learning-based model that learned how to play Rocket League in a sample-efficient manner.
Our contributions include the development of a reward analysis and visualization library, a novel parameterizable reward shape function, and auxiliary neural architectures.
arXiv Detail & Related papers (2023-05-25T07:33:17Z) - Targeted Search Control in AlphaZero for Effective Policy Improvement [93.30151539224144]
We introduce Go-Exploit, a novel search control strategy for AlphaZero.
Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest.
Go-Exploit learns with a greater sample efficiency than standard AlphaZero.
arXiv Detail & Related papers (2023-02-23T22:50:24Z) - A Unified Approach to Reinforcement Learning, Quantal Response
Equilibria, and Two-Player Zero-Sum Games [104.3339905200105]
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.
Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games.
arXiv Detail & Related papers (2022-06-12T19:49:14Z) - AlphaZero-Inspired General Board Game Learning and Playing [0.0]
Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning.
In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents.
We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper.
arXiv Detail & Related papers (2022-04-28T07:04:14Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning [5.55810668640617]
We propose a warm-start enhancement method for Monte Carlo Tree Search.
We show that our approach works better than the fixed $Iprime$, especially for "deep," tactical, games.
We conclude that AlphaZero-like deep reinforcement learning benefits from adaptive rollout based warm-start.
arXiv Detail & Related papers (2021-05-13T08:24:51Z) - Combining Off and On-Policy Training in Model-Based Reinforcement
Learning [77.34726150561087]
We propose a way to obtain off-policy targets using data from simulated games in MuZero.
Our results show that these targets speed up the training process and lead to faster convergence and higher rewards.
arXiv Detail & Related papers (2021-02-24T10:47:26Z) - Deep Policy Networks for NPC Behaviors that Adapt to Changing Design
Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL)
We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z) - Chrome Dino Run using Reinforcement Learning [0.0]
We study most popular model free reinforcement learning algorithms along with convolutional neural network to train the agent for playing the game of Chrome Dino Run.
We have used two of the popular temporal difference approaches namely Deep Q-Learning, and Expected SARSA and also implemented Double DQN model to train the agent.
arXiv Detail & Related papers (2020-08-15T22:18:20Z) - Learning Compositional Neural Programs for Continuous Control [62.80551956557359]
We propose a novel solution to challenging sparse-reward, continuous control problems.
Our solution, dubbed AlphaNPI-X, involves three separate stages of learning.
We empirically show that AlphaNPI-X can effectively learn to tackle challenging sparse manipulation tasks.
arXiv Detail & Related papers (2020-07-27T08:27:14Z) - AutoEG: Automated Experience Grafting for Off-Policy Deep Reinforcement
Learning [11.159797940803593]
We develop an algorithm, called Experience Grafting (EG), to enable RL agents to reorganize segments of the few high-quality trajectories from the experience pool.
We further develop an AutoEG agent that automatically learns to adjust the grafting-based learning strategy.
Results collected from a set of six robotic control environments show that, in comparison to a standard deep RL algorithm (DDPG), AutoEG increases the speed of learning process by at least 30%.
arXiv Detail & Related papers (2020-04-22T17:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.