Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in
Self-Play?
- URL: http://arxiv.org/abs/2003.05988v1
- Date: Thu, 12 Mar 2020 19:28:48 GMT
- Title: Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in
Self-Play?
- Authors: Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat
- Abstract summary: In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches.
We evaluate how these parameters contribute to training in an AlphaZero-like self-play algorithm.
We find surprising results where too much training can sometimes lead to lower performance.
- Score: 4.534822382040738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The landmark achievements of AlphaGo Zero have created great research
interest into self-play in reinforcement learning. In self-play, Monte Carlo
Tree Search is used to train a deep neural network, that is then used in tree
searches. Training itself is governed by many hyperparameters.There has been
surprisingly little research on design choices for hyper-parameter values and
loss-functions, presumably because of the prohibitive computational cost to
explore the parameter space. In this paper, we investigate 12 hyper-parameters
in an AlphaZero-like self-play algorithm and evaluate how these parameters
contribute to training. We use small games, to achieve meaningful exploration
with moderate computational effort. The experimental results show that training
is highly sensitive to hyper-parameter choices. Through multi-objective
analysis we identify 4 important hyper-parameters to further assess. To start,
we find surprising results where too much training can sometimes lead to lower
performance. Our main result is that the number of self-play iterations
subsumes MCTS-search simulations, game-episodes, and training epochs. The
intuition is that these three increase together as self-play iterations
increase, and that increasing them individually is sub-optimal. A consequence
of our experiments is a direct recommendation for setting hyper-parameter
values in self-play: the overarching outer-loop of self-play iterations should
be maximized, in favor of the three inner-loop hyper-parameters, which should
be set at lower values. A secondary result of our experiments concerns the
choice of optimization goals, for which we also provide recommendations.
Related papers
- Combining Automated Optimisation of Hyperparameters and Reward Shape [7.407166175374958]
We propose a methodology for the combined optimisation of hyperparameters and the reward function.
We conducted extensive experiments using Proximal Policy optimisation and Soft Actor-Critic.
Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others.
arXiv Detail & Related papers (2024-06-26T12:23:54Z) - Reward Finetuning for Faster and More Accurate Unsupervised Object
Discovery [64.41455104593304]
Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences.
We propose to adapt similar RL-based methods to unsupervised object discovery.
We demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train.
arXiv Detail & Related papers (2023-10-29T17:03:12Z) - AutoRL Hyperparameter Landscapes [69.15927869840918]
Reinforcement Learning (RL) has shown to be capable of producing impressive results, but its use is limited by the impact of its hyperparameters on performance.
We propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training.
This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
arXiv Detail & Related papers (2023-04-05T12:14:41Z) - Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning [72.83293818245978]
We design and learn a neural network (NN)-based auto-tuner for hyper- parameter tuning in sparse Bayesian learning.
We show that considerable improvement in convergence rate and recovery performance can be achieved.
arXiv Detail & Related papers (2022-11-09T12:34:59Z) - Goal-Oriented Sensitivity Analysis of Hyperparameters in Deep Learning [0.0]
We study the use of goal-oriented sensitivity analysis, based on the Hilbert-Schmidt Independence Criterion (HSIC), for hyperparameter analysis and optimization.
We derive an HSIC-based optimization algorithm that we apply on MNIST and Cifar, classical machine learning data sets, of interest for scientific machine learning.
arXiv Detail & Related papers (2022-07-13T14:21:12Z) - No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for
Training Large Transformer Models [132.90062129639705]
We propose a novel training strategy that encourages all parameters to be trained sufficiently.
A parameter with low sensitivity is redundant, and we improve its fitting by increasing its learning rate.
In contrast, a parameter with high sensitivity is well-trained and we regularize it by decreasing its learning rate to prevent further overfitting.
arXiv Detail & Related papers (2022-02-06T00:22:28Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - To tune or not to tune? An Approach for Recommending Important
Hyperparameters [2.121963121603413]
We consider building the relationship between the performance of the machine learning models and their hyperparameters to discover the trend and gain insights.
Our results enable users to decide whether it is worth conducting a possibly time-consuming tuning strategy.
arXiv Detail & Related papers (2021-08-30T08:54:58Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Quantity vs. Quality: On Hyperparameter Optimization for Deep
Reinforcement Learning [7.559006677497745]
Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds.
We benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions.
arXiv Detail & Related papers (2020-07-29T05:12:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.