Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning
- URL: http://arxiv.org/abs/2112.08094v1
- Date: Wed, 15 Dec 2021 13:10:44 GMT
- Title: Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning
- Authors: Juan Cruz Barsce, Jorge A. Palombarini, Ernesto C. Mart\'inez
- Abstract summary: In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimal setting of several hyper-parameters in machine learning algorithms is
key to make the most of available data. To this aim, several methods such as
evolutionary strategies, random search, Bayesian optimization and heuristic
rules of thumb have been proposed. In reinforcement learning (RL), the
information content of data gathered by the learning agent while interacting
with its environment is heavily dependent on the setting of many
hyper-parameters. Therefore, the user of an RL algorithm has to rely on
search-based optimization methods, such as grid search or the Nelder-Mead
simplex algorithm, that are very inefficient for most RL tasks, slows down
significantly the learning curve and leaves to the user the burden of
purposefully biasing data gathering. In this work, in order to make an RL
algorithm more user-independent, a novel approach for autonomous
hyper-parameter setting using Bayesian optimization is proposed. Data from past
episodes and different hyper-parameter values are used at a meta-learning level
by performing behavioral cloning which helps improving the effectiveness in
maximizing a reinforcement learning variant of an acquisition function. Also,
by tightly integrating Bayesian optimization in a reinforcement learning agent
design, the number of state transitions needed to converge to the optimal
policy for a given task is reduced. Computational experiments reveal promising
results compared to other manual tweaking and optimization-based approaches
which highlights the benefits of changing the algorithm hyper-parameters to
increase the information content of generated data.
Related papers
- Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning [1.3654846342364308]
We use Efficient Global Optimization algorithm to train RL agent in a simulation environment.
There is a substantial increase of 4% when compared to existing manually tuned parameters.
arXiv Detail & Related papers (2024-07-19T12:40:08Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Hyperparameter Adaptive Search for Surrogate Optimization: A
Self-Adjusting Approach [1.6317061277457001]
Surrogate optimization (SO) algorithms have shown promise for optimizing expensive black-box functions.
Our approach identifies and modifies the most influential hyper parameters specific to each problem and SO approach.
Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms.
arXiv Detail & Related papers (2023-10-12T01:26:05Z) - Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization [0.0]
This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
arXiv Detail & Related papers (2023-05-09T23:51:24Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian
Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets.
We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.