Related papers: Hyperparameters in Reinforcement Learning and How To Tune Them

Hyperparameters in Reinforcement Learning and How To Tune Them

URL: http://arxiv.org/abs/2306.01324v1
Date: Fri, 2 Jun 2023 07:48:18 GMT
Title: Hyperparameters in Reinforcement Learning and How To Tune Them
Authors: Theresa Eimer, Marius Lindauer, Roberta Raileanu
Abstract summary: We show that hyper parameter choices in deep reinforcement learning can significantly affect the agent's final performance and sample efficiency. We propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds. We support this by comparing state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts.
Score: 25.782420501870295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies widely across papers, which makes it challenging to compare RL algorithms fairly. In this paper, we show that hyperparameter choices in RL can significantly affect the agent's final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. We therefore propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds, as well as principled hyperparameter optimization (HPO) across a broad search space. We support this by comparing multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress. In order to encourage the adoption of these practices, we provide plug-and-play implementations of the tuning algorithms used in this paper at https://github.com/facebookresearch/how-to-autorl.

Related papers

POCAII: Parameter Optimization with Conscious Allocation using Iterative Intelligence [4.478575931884855]
POCAII is a flexible scheme for managing a hyper parameter optimization budget.<n>It shows superior performance in low-budget hyper parameter optimization regimes.<n>It has wide applications to real-world problems.
arXiv Detail & Related papers (2025-05-16T23:05:07Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning [50.53705050673944]
We propose ULTHO, an ultra-lightweight yet powerful framework for fast HPO in deep RL within single runs. Specifically, we formulate the HPO process as a multi-armed bandit with clustered arms (MABC) and link it directly to long-term return optimization. We test ULTHO on benchmarks including ALE, Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-03-08T07:03:43Z)
Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning [42.33815055388433]
ARLBench is a benchmark for hyperparameter optimization (HPO) in reinforcement learning (RL) It allows comparisons of diverse HPO approaches while being highly efficient in evaluation. ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL.
arXiv Detail & Related papers (2024-09-27T15:22:28Z)
PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning [49.92394599459274]
We propose PriorBand, an HPO algorithm tailored to Deep Learning (DL) pipelines. We show its robustness across a range of DL benchmarks and show its gains under informative expert input and against poor expert beliefs.
arXiv Detail & Related papers (2023-06-21T16:26:14Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
AutoRL Hyperparameter Landscapes [69.15927869840918]
Reinforcement Learning (RL) has shown to be capable of producing impressive results, but its use is limited by the impact of its hyperparameters on performance. We propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
arXiv Detail & Related papers (2023-04-05T12:14:41Z)
A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning [8.659973888018781]
A Reinforcement Learning (RL) system depends on a set of initial conditions that affect the system's performance. We propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs. We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making.
arXiv Detail & Related papers (2023-03-09T11:30:40Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation [48.821062916381685]
Federated learning (FL) is a distributed machine learning technique that enables collaborative model training while avoiding explicit data sharing. In this work, we propose an efficient reinforcement learning(RL)-based federated hyperparameter optimization algorithm, termed Auto-FedRL. The effectiveness of the proposed method is validated on a heterogeneous data split of the CIFAR-10 dataset and two real-world medical image segmentation datasets.
arXiv Detail & Related papers (2022-03-12T04:11:42Z)
Hyperparameter Tuning for Deep Reinforcement Learning Applications [0.3553493344868413]
We propose a distributed variable-length genetic algorithm framework to tune hyperparameters for various RL applications. Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment.
arXiv Detail & Related papers (2022-01-26T20:43:13Z)
Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters. In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed. Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z)
Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs) It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously. This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
Sample-Efficient Automated Deep Reinforcement Learning [33.53903358611521]
We propose a population-based automated RL framework to meta-optimize arbitrary off-policy RL algorithms. By sharing the collected experience across the population, we substantially increase the sample efficiency of the meta-optimization. We demonstrate the capabilities of our sample-efficient AutoRL approach in a case study with the popular TD3 algorithm in the MuJoCo benchmark suite.
arXiv Detail & Related papers (2020-09-03T10:04:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.