Related papers: Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

URL: http://arxiv.org/abs/2003.05988v1
Date: Thu, 12 Mar 2020 19:28:48 GMT
Title: Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?
Authors: Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat
Abstract summary: In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. We evaluate how these parameters contribute to training in an AlphaZero-like self-play algorithm. We find surprising results where too much training can sometimes lead to lower performance.
Score: 4.534822382040738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game-episodes, and training epochs. The intuition is that these three increase together as self-play iterations increase, and that increasing them individually is sub-optimal. A consequence of our experiments is a direct recommendation for setting hyper-parameter values in self-play: the overarching outer-loop of self-play iterations should be maximized, in favor of the three inner-loop hyper-parameters, which should be set at lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.

Related papers

How far away are truly hyperparameter-free learning algorithms? [21.3925393750153]
We evaluate the potential of learning-rate-free methods as components of hyperparameter-free methods.<n>We find that literature-supplied default settings performed poorly on the benchmark.<n>The best AlgoPerf-calibrated learning-rate-free methods had much improved performance but still lagged slightly behind a similarly calibrated baseline in overall benchmark score.
arXiv Detail & Related papers (2025-05-29T20:57:31Z)
Combining Automated Optimisation of Hyperparameters and Reward Shape [7.407166175374958]
We propose a methodology for the combined optimisation of hyperparameters and the reward function. We conducted extensive experiments using Proximal Policy optimisation and Soft Actor-Critic. Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others.
arXiv Detail & Related papers (2024-06-26T12:23:54Z)
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery [64.41455104593304]
Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. We propose to adapt similar RL-based methods to unsupervised object discovery. We demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train.
arXiv Detail & Related papers (2023-10-29T17:03:12Z)
AutoRL Hyperparameter Landscapes [69.15927869840918]
Reinforcement Learning (RL) has shown to be capable of producing impressive results, but its use is limited by the impact of its hyperparameters on performance. We propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
arXiv Detail & Related papers (2023-04-05T12:14:41Z)
Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning [72.83293818245978]
We design and learn a neural network (NN)-based auto-tuner for hyper- parameter tuning in sparse Bayesian learning. We show that considerable improvement in convergence rate and recovery performance can be achieved.
arXiv Detail & Related papers (2022-11-09T12:34:59Z)
Goal-Oriented Sensitivity Analysis of Hyperparameters in Deep Learning [0.0]
We study the use of goal-oriented sensitivity analysis, based on the Hilbert-Schmidt Independence Criterion (HSIC), for hyperparameter analysis and optimization. We derive an HSIC-based optimization algorithm that we apply on MNIST and Cifar, classical machine learning data sets, of interest for scientific machine learning.
arXiv Detail & Related papers (2022-07-13T14:21:12Z)
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models [132.90062129639705]
We propose a novel training strategy that encourages all parameters to be trained sufficiently. A parameter with low sensitivity is redundant, and we improve its fitting by increasing its learning rate. In contrast, a parameter with high sensitivity is well-trained and we regularize it by decreasing its learning rate to prevent further overfitting.
arXiv Detail & Related papers (2022-02-06T00:22:28Z)
Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems. However, training RL agents to solve robotics tasks still remains challenging. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z)
To tune or not to tune? An Approach for Recommending Important Hyperparameters [2.121963121603413]
We consider building the relationship between the performance of the machine learning models and their hyperparameters to discover the trend and gain insights. Our results enable users to decide whether it is worth conducting a possibly time-consuming tuning strategy.
arXiv Detail & Related papers (2021-08-30T08:54:58Z)
How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z)
Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning [7.559006677497745]
Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. We benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions.
arXiv Detail & Related papers (2020-07-29T05:12:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.