Related papers: Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

URL: http://arxiv.org/abs/2112.08094v1
Date: Wed, 15 Dec 2021 13:10:44 GMT
Title: Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning
Authors: Juan Cruz Barsce, Jorge A. Palombarini, Ernesto C. Mart\'inez
Abstract summary: In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters. In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed. Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimal setting of several hyper-parameters in machine learning algorithms is key to make the most of available data. To this aim, several methods such as evolutionary strategies, random search, Bayesian optimization and heuristic rules of thumb have been proposed. In reinforcement learning (RL), the information content of data gathered by the learning agent while interacting with its environment is heavily dependent on the setting of many hyper-parameters. Therefore, the user of an RL algorithm has to rely on search-based optimization methods, such as grid search or the Nelder-Mead simplex algorithm, that are very inefficient for most RL tasks, slows down significantly the learning curve and leaves to the user the burden of purposefully biasing data gathering. In this work, in order to make an RL algorithm more user-independent, a novel approach for autonomous hyper-parameter setting using Bayesian optimization is proposed. Data from past episodes and different hyper-parameter values are used at a meta-learning level by performing behavioral cloning which helps improving the effectiveness in maximizing a reinforcement learning variant of an acquisition function. Also, by tightly integrating Bayesian optimization in a reinforcement learning agent design, the number of state transitions needed to converge to the optimal policy for a given task is reduced. Computational experiments reveal promising results compared to other manual tweaking and optimization-based approaches which highlights the benefits of changing the algorithm hyper-parameters to increase the information content of generated data.

Related papers

Towards hyperparameter-free optimization with differential privacy [9.193537596304669]
Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. In this work, we adapt the automatic learning rate schedule to DP optimization for any models and achieves state-of-the-art DP performance on various language and vision tasks.
arXiv Detail & Related papers (2025-03-02T02:59:52Z)
Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning [1.3654846342364308]
We use Efficient Global Optimization algorithm to train RL agent in a simulation environment. There is a substantial increase of 4% when compared to existing manually tuned parameters.
arXiv Detail & Related papers (2024-07-19T12:40:08Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach [1.6317061277457001]
Surrogate optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. Our approach identifies and modifies the most influential hyper parameters specific to each problem and SO approach. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms.
arXiv Detail & Related papers (2023-10-12T01:26:05Z)
Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization [0.0]
This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
arXiv Detail & Related papers (2023-05-09T23:51:24Z)
Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
Consolidated learning -- a domain-specific model-free optimization strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning. In such settings, we are interested in the total optimization time rather than tuning for a single task. We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z)
Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs) It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously. This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets. We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.