Quantity vs. Quality: On Hyperparameter Optimization for Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2007.14604v2
- Date: Thu, 30 Jul 2020 06:16:00 GMT
- Title: Quantity vs. Quality: On Hyperparameter Optimization for Deep
Reinforcement Learning
- Authors: Lars Hertel, Pierre Baldi, Daniel L. Gillen
- Abstract summary: Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds.
We benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions.
- Score: 7.559006677497745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning algorithms can show strong variation in performance
between training runs with different random seeds. In this paper we explore how
this affects hyperparameter optimization when the goal is to find
hyperparameter settings that perform well across random seeds. In particular,
we benchmark whether it is better to explore a large quantity of hyperparameter
settings via pruning of bad performers, or if it is better to aim for quality
of collected results by using repetitions. For this we consider the Successive
Halving, Random Search, and Bayesian Optimization algorithms, the latter two
with and without repetitions. We apply these to tuning the PPO2 algorithm on
the Cartpole balancing task and the Inverted Pendulum Swing-up task. We
demonstrate that pruning may negatively affect the optimization and that
repeated sampling does not help in finding hyperparameter settings that perform
better across random seeds. From our experiments we conclude that Bayesian
optimization with a noise robust acquisition function is the best choice for
hyperparameter optimization in reinforcement learning tasks.
Related papers
- Combining Automated Optimisation of Hyperparameters and Reward Shape [7.407166175374958]
We propose a methodology for the combined optimisation of hyperparameters and the reward function.
We conducted extensive experiments using Proximal Policy optimisation and Soft Actor-Critic.
Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others.
arXiv Detail & Related papers (2024-06-26T12:23:54Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian
Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets.
We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z) - Weighted Random Search for Hyperparameter Optimization [0.0]
We introduce an improved version of Random Search (RS), used here for hyper parameter optimization of machine learning algorithms.
We generate new values for each hyper parameter with a probability of change, unlike the standard RS.
Within the same computational budget, our method yields better results than the standard RS.
arXiv Detail & Related papers (2020-04-03T15:41:22Z) - Towards Automatic Bayesian Optimization: A first step involving
acquisition functions [0.0]
Bayesian optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression.
We propose a first attempt over automatic bayesian optimization by exploring several techniques that automatically tune the acquisition function.
arXiv Detail & Related papers (2020-03-21T12:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.