Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.15078v1
- Date: Thu, 29 Sep 2022 19:57:43 GMT
- Title: Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in
Reinforcement Learning
- Authors: Renata Garcia and Wouter Caarls
- Abstract summary: Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model.
We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set.
Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
- Score: 0.38073142980732994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning is a promising paradigm for learning robot control,
allowing complex control policies to be learned without requiring a dynamics
model. However, even state of the art algorithms can be difficult to tune for
optimum performance. We propose employing an ensemble of multiple reinforcement
learning agents, each with a different set of hyperparameters, along with a
mechanism for choosing the best performing set(s) on-line. In the literature,
the ensemble technique is used to improve performance in general, but the
current work specifically addresses decreasing the hyperparameter tuning
effort. Furthermore, our approach targets on-line learning on a single robotic
system, and does not require running multiple simulators in parallel. Although
the idea is generic, the Deep Deterministic Policy Gradient was the model
chosen, being a representative deep learning actor-critic method with good
performance in continuous action settings but known high variance. We compare
our online weighted q-ensemble approach to q-average ensemble strategies
addressed in literature using alternate policy training, as well as online
training, demonstrating the advantage of the new approach in eliminating
hyperparameter tuning. The applicability to real-world systems was validated in
common robotic benchmark environments: the bipedal robot half cheetah and the
swimmer. Online Weighted Q-Ensemble presented overall lower variance and
superior results when compared with q-average ensembles using randomized
parameterizations.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL [28.31529154045046]
We propose a new approach to tune hyperparameters from offline logs of data.
We first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model.
We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.
arXiv Detail & Related papers (2022-05-18T04:26:23Z) - Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data.
We show that a neural network can model highly nonlinear behaviors accurately for large time horizons.
In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z) - Hyperparameter Tuning for Deep Reinforcement Learning Applications [0.3553493344868413]
We propose a distributed variable-length genetic algorithm framework to tune hyperparameters for various RL applications.
Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment.
arXiv Detail & Related papers (2022-01-26T20:43:13Z) - Scalable One-Pass Optimisation of High-Dimensional Weight-Update
Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser.
It requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z) - Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL)
In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula.
In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Hyperparameter Auto-tuning in Self-Supervised Robotic Learning [12.193817049957733]
Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources.
We propose an auto-tuning technique based on the Evidence Lower Bound (ELBO) for self-supervised reinforcement learning.
Our method can auto-tune online and yields the best performance at a fraction of the time and computational resources.
arXiv Detail & Related papers (2020-10-16T08:58:24Z) - Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary
Strategies [41.13416324282365]
We propose a framework which entails the application of Evolutionary Strategies to online hyper- parameter tuning in off-policy learning.
Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces.
arXiv Detail & Related papers (2020-06-13T03:54:26Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.