Related papers: No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

URL: http://arxiv.org/abs/2205.08716v1
Date: Wed, 18 May 2022 04:26:23 GMT
Title: No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL
Authors: Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White
Abstract summary: We propose a new approach to tune hyperparameters from offline logs of data. We first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model. We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.
Score: 28.31529154045046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We propose a new approach to tune hyperparameters from offline logs of data, to fully specify the hyperparameters for an RL agent that learns online in the real world. The approach is conceptually simple: we first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model to identify promising hyperparameters. We identify several criteria to make this strategy effective, and develop an approach that satisfies these criteria. We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.

Related papers

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [56.58170370127227]
We show that optimal learning rate follows a power-law relationship with both model parameters and data sizes, while optimal batch size scales primarily with data sizes. This work is the first work that unifies different model shapes and structures, such as Mixture-of-Experts models and dense transformers.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU) Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z)
AutoRL Hyperparameter Landscapes [69.15927869840918]
Reinforcement Learning (RL) has shown to be capable of producing impressive results, but its use is limited by the impact of its hyperparameters on performance. We propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
arXiv Detail & Related papers (2023-04-05T12:14:41Z)
A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning [8.659973888018781]
A Reinforcement Learning (RL) system depends on a set of initial conditions that affect the system's performance. We propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs. We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making.
arXiv Detail & Related papers (2023-03-09T11:30:40Z)
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience. We propose the first online continuous hyperparameter tuning framework for contextual bandits. We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z)
Hyperparameters in Contextual RL are Highly Situational [16.328866317851183]
Reinforcement Learning (RL) has shown impressive results in games and simulation, but real-world application suffers from its instability under changing environment conditions. We show that the hyper parameters found by HPO methods are not only dependent on the problem at hand, but even on how well the state describes the environment dynamics.
arXiv Detail & Related papers (2022-12-21T09:38:18Z)
Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning [0.38073142980732994]
Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model. We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set. Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
arXiv Detail & Related papers (2022-09-29T19:57:43Z)
Automating DBSCAN via Deep Reinforcement Learning [73.82740568765279]
We propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process. The framework consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively.
arXiv Detail & Related papers (2022-08-09T04:40:11Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
Hyperparameter Selection for Offline Reinforcement Learning [61.92834684647419]
offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. Existing hyperparameter selection methods for offline RL break the offline assumption.
arXiv Detail & Related papers (2020-07-17T15:30:38Z)
Robust Federated Learning Through Representation Matching and Adaptive Hyper-parameters [5.319361976450981]
Federated learning is a distributed, privacy-aware learning scenario which trains a single model on data belonging to several clients. Current federated learning methods struggle in cases with heterogeneous client-side data distributions. We propose a novel representation matching scheme that reduces the divergence of local models.
arXiv Detail & Related papers (2019-12-30T20:19:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.