Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary
Strategies
- URL: http://arxiv.org/abs/2006.07554v1
- Date: Sat, 13 Jun 2020 03:54:26 GMT
- Title: Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary
Strategies
- Authors: Yunhao Tang, Krzysztof Choromanski
- Abstract summary: We propose a framework which entails the application of Evolutionary Strategies to online hyper- parameter tuning in off-policy learning.
Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces.
- Score: 41.13416324282365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-policy learning algorithms have been known to be sensitive to the choice
of hyper-parameters. However, unlike near on-policy algorithms for which
hyper-parameters could be optimized via e.g. meta-gradients, similar techniques
could not be straightforwardly applied to off-policy learning. In this work, we
propose a framework which entails the application of Evolutionary Strategies to
online hyper-parameter tuning in off-policy learning. Our formulation draws
close connections to meta-gradients and leverages the strengths of black-box
optimization with relatively low-dimensional search spaces. We show that our
method outperforms state-of-the-art off-policy learning baselines with static
hyper-parameters and recent prior work over a wide range of continuous control
benchmarks.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in
Reinforcement Learning [0.38073142980732994]
Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model.
We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set.
Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
arXiv Detail & Related papers (2022-09-29T19:57:43Z) - A Theoretical Framework of Almost Hyperparameter-free Hyperparameter
Selection Methods for Offline Policy Evaluation [2.741266294612776]
offline reinforcement learning (OPE) is a core technology for data-driven decision optimization without environment simulators.
We introduce a new approximate hyper parameter selection framework for OPE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner.
We derive four AHS methods each of which has different characteristics such as convergence rate and time complexity.
arXiv Detail & Related papers (2022-01-07T02:23:09Z) - Episodic Policy Gradient Training [43.62408764384791]
Episodic Policy Gradient Training (EPGT)
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
arXiv Detail & Related papers (2021-12-03T11:15:32Z) - Towards Hyperparameter-free Policy Selection for Offline Reinforcement
Learning [10.457660611114457]
We show how to select between policies and value functions produced by different training algorithms in offline reinforcement learning.
We use BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari.
arXiv Detail & Related papers (2021-10-26T20:12:11Z) - Online Hyperparameter Meta-Learning with Hypergradient Distillation [59.973770725729636]
gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization.
We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
arXiv Detail & Related papers (2021-10-06T05:14:53Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.