Hyperparameters in Reinforcement Learning and How To Tune Them
- URL: http://arxiv.org/abs/2306.01324v1
- Date: Fri, 2 Jun 2023 07:48:18 GMT
- Title: Hyperparameters in Reinforcement Learning and How To Tune Them
- Authors: Theresa Eimer, Marius Lindauer, Roberta Raileanu
- Abstract summary: We show that hyper parameter choices in deep reinforcement learning can significantly affect the agent's final performance and sample efficiency.
We propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds.
We support this by comparing state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts.
- Score: 25.782420501870295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to improve reproducibility, deep reinforcement learning (RL) has
been adopting better scientific practices such as standardized evaluation
metrics and reporting. However, the process of hyperparameter optimization
still varies widely across papers, which makes it challenging to compare RL
algorithms fairly. In this paper, we show that hyperparameter choices in RL can
significantly affect the agent's final performance and sample efficiency, and
that the hyperparameter landscape can strongly depend on the tuning seed which
may lead to overfitting. We therefore propose adopting established best
practices from AutoML, such as the separation of tuning and testing seeds, as
well as principled hyperparameter optimization (HPO) across a broad search
space. We support this by comparing multiple state-of-the-art HPO tools on a
range of RL algorithms and environments to their hand-tuned counterparts,
demonstrating that HPO approaches often have higher performance and lower
compute overhead. As a result of our findings, we recommend a set of best
practices for the RL community, which should result in stronger empirical
results with fewer computational costs, better reproducibility, and thus faster
progress. In order to encourage the adoption of these practices, we provide
plug-and-play implementations of the tuning algorithms used in this paper at
https://github.com/facebookresearch/how-to-autorl.
Related papers
- ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning [42.33815055388433]
ARLBench is a benchmark for hyperparameter optimization (HPO) in reinforcement learning (RL)
It allows comparisons of diverse HPO approaches while being highly efficient in evaluation.
ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL.
arXiv Detail & Related papers (2024-09-27T15:22:28Z) - PriorBand: Practical Hyperparameter Optimization in the Age of Deep
Learning [49.92394599459274]
We propose PriorBand, an HPO algorithm tailored to Deep Learning (DL) pipelines.
We show its robustness across a range of DL benchmarks and show its gains under informative expert input and against poor expert beliefs.
arXiv Detail & Related papers (2023-06-21T16:26:14Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - AutoRL Hyperparameter Landscapes [69.15927869840918]
Reinforcement Learning (RL) has shown to be capable of producing impressive results, but its use is limited by the impact of its hyperparameters on performance.
We propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training.
This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
arXiv Detail & Related papers (2023-04-05T12:14:41Z) - A Framework for History-Aware Hyperparameter Optimisation in
Reinforcement Learning [8.659973888018781]
A Reinforcement Learning (RL) system depends on a set of initial conditions that affect the system's performance.
We propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs.
We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making.
arXiv Detail & Related papers (2023-03-09T11:30:40Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Auto-FedRL: Federated Hyperparameter Optimization for
Multi-institutional Medical Image Segmentation [48.821062916381685]
Federated learning (FL) is a distributed machine learning technique that enables collaborative model training while avoiding explicit data sharing.
In this work, we propose an efficient reinforcement learning(RL)-based federated hyperparameter optimization algorithm, termed Auto-FedRL.
The effectiveness of the proposed method is validated on a heterogeneous data split of the CIFAR-10 dataset and two real-world medical image segmentation datasets.
arXiv Detail & Related papers (2022-03-12T04:11:42Z) - Hyperparameter Tuning for Deep Reinforcement Learning Applications [0.3553493344868413]
We propose a distributed variable-length genetic algorithm framework to tune hyperparameters for various RL applications.
Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment.
arXiv Detail & Related papers (2022-01-26T20:43:13Z) - Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Sample-Efficient Automated Deep Reinforcement Learning [33.53903358611521]
We propose a population-based automated RL framework to meta-optimize arbitrary off-policy RL algorithms.
By sharing the collected experience across the population, we substantially increase the sample efficiency of the meta-optimization.
We demonstrate the capabilities of our sample-efficient AutoRL approach in a case study with the popular TD3 algorithm in the MuJoCo benchmark suite.
arXiv Detail & Related papers (2020-09-03T10:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.