Related papers: Warm-Start AlphaZero Self-Play Search Enhancements

Warm-Start AlphaZero Self-Play Search Enhancements

URL: http://arxiv.org/abs/2004.12357v1
Date: Sun, 26 Apr 2020 11:48:53 GMT
Title: Warm-Start AlphaZero Self-Play Search Enhancements
Authors: Hui Wang, Mike Preuss, Aske Plaat
Abstract summary: Recently, AlphaZero has achieved landmark results in deep reinforcement learning. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements. Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games.
Score: 5.096685900776467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly.

Related papers

AlphaZero-Edu: Making AlphaZero Accessible to Everyone [4.520853683436092]
We present AlphaZero-Edu, a lightweight, education-focused implementation built upon the mathematical framework of AlphaZero. It boasts a modular architecture that disentangles key components, enabling transparent visualization of the algorithmic processes. In Gomoku matches, the framework has demonstrated exceptional performance, achieving a consistently high win rate against human opponents.
arXiv Detail & Related papers (2025-04-20T14:29:39Z)
Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations [0.0]
Reinforcement Learning (RL) has been widely used in many applications, particularly in gaming. Google DeepMind has pioneered innovations in this field, employing reinforcement learning algorithms to create advanced AI models. This paper reviews the significance of reinforcement learning applications in Atari and strategy-based games.
arXiv Detail & Related papers (2025-02-14T17:06:34Z)
Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning [0.0]
We present Lucy-SKG, a Reinforcement Learning-based model that learned how to play Rocket League in a sample-efficient manner. Our contributions include the development of a reward analysis and visualization library, a novel parameterizable reward shape function, and auxiliary neural architectures.
arXiv Detail & Related papers (2023-05-25T07:33:17Z)
Targeted Search Control in AlphaZero for Effective Policy Improvement [93.30151539224144]
We introduce Go-Exploit, a novel search control strategy for AlphaZero. Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest. Go-Exploit learns with a greater sample efficiency than standard AlphaZero.
arXiv Detail & Related papers (2023-02-23T22:50:24Z)
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games [104.3339905200105]
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games.
arXiv Detail & Related papers (2022-06-12T19:49:14Z)
AlphaZero-Inspired General Board Game Learning and Playing [0.0]
Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper.
arXiv Detail & Related papers (2022-04-28T07:04:14Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning [5.55810668640617]
We propose a warm-start enhancement method for Monte Carlo Tree Search. We show that our approach works better than the fixed $Iprime$, especially for "deep," tactical, games. We conclude that AlphaZero-like deep reinforcement learning benefits from adaptive rollout based warm-start.
arXiv Detail & Related papers (2021-05-13T08:24:51Z)
Combining Off and On-Policy Training in Model-Based Reinforcement Learning [77.34726150561087]
We propose a way to obtain off-policy targets using data from simulated games in MuZero. Our results show that these targets speed up the training process and lead to faster convergence and higher rewards.
arXiv Detail & Related papers (2021-02-24T10:47:26Z)
Deep Policy Networks for NPC Behaviors that Adapt to Changing Design Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL) We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z)
Chrome Dino Run using Reinforcement Learning [0.0]
We study most popular model free reinforcement learning algorithms along with convolutional neural network to train the agent for playing the game of Chrome Dino Run. We have used two of the popular temporal difference approaches namely Deep Q-Learning, and Expected SARSA and also implemented Double DQN model to train the agent.
arXiv Detail & Related papers (2020-08-15T22:18:20Z)
Learning Compositional Neural Programs for Continuous Control [62.80551956557359]
We propose a novel solution to challenging sparse-reward, continuous control problems. Our solution, dubbed AlphaNPI-X, involves three separate stages of learning. We empirically show that AlphaNPI-X can effectively learn to tackle challenging sparse manipulation tasks.
arXiv Detail & Related papers (2020-07-27T08:27:14Z)
AutoEG: Automated Experience Grafting for Off-Policy Deep Reinforcement Learning [11.159797940803593]
We develop an algorithm, called Experience Grafting (EG), to enable RL agents to reorganize segments of the few high-quality trajectories from the experience pool. We further develop an AutoEG agent that automatically learns to adjust the grafting-based learning strategy. Results collected from a set of six robotic control environments show that, in comparison to a standard deep RL algorithm (DDPG), AutoEG increases the speed of learning process by at least 30%.
arXiv Detail & Related papers (2020-04-22T17:07:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.