Related papers: Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

URL: http://arxiv.org/abs/2209.15078v1
Date: Thu, 29 Sep 2022 19:57:43 GMT
Title: Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning
Authors: Renata Garcia and Wouter Caarls
Abstract summary: Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model. We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set. Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
Score: 0.38073142980732994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model. However, even state of the art algorithms can be difficult to tune for optimum performance. We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyperparameters, along with a mechanism for choosing the best performing set(s) on-line. In the literature, the ensemble technique is used to improve performance in general, but the current work specifically addresses decreasing the hyperparameter tuning effort. Furthermore, our approach targets on-line learning on a single robotic system, and does not require running multiple simulators in parallel. Although the idea is generic, the Deep Deterministic Policy Gradient was the model chosen, being a representative deep learning actor-critic method with good performance in continuous action settings but known high variance. We compare our online weighted q-ensemble approach to q-average ensemble strategies addressed in literature using alternate policy training, as well as online training, demonstrating the advantage of the new approach in eliminating hyperparameter tuning. The applicability to real-world systems was validated in common robotic benchmark environments: the bipedal robot half cheetah and the swimmer. Online Weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles using randomized parameterizations.

Related papers

How far away are truly hyperparameter-free learning algorithms? [21.3925393750153]
We evaluate the potential of learning-rate-free methods as components of hyperparameter-free methods.<n>We find that literature-supplied default settings performed poorly on the benchmark.<n>The best AlgoPerf-calibrated learning-rate-free methods had much improved performance but still lagged slightly behind a similarly calibrated baseline in overall benchmark score.
arXiv Detail & Related papers (2025-05-29T20:57:31Z)
Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning [4.439166564530879]
We propose an Automated Hybrid Reward Scheduling framework based on Large Language Models (LLMs)<n>This paradigm dynamically adjusts the learning intensity of each reward component throughout the policy optimization process.<n> Experimental results demonstrate that the AHRS method achieves an average 6.48% performance improvement across multiple high-degree-of-freedom robotic tasks.
arXiv Detail & Related papers (2025-05-05T09:06:17Z)
Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks. We introduce a generative framework leveraging flow matching for online robot dynamics model alignment. We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience. We propose the first online continuous hyperparameter tuning framework for contextual bandits. We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z)
No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL [28.31529154045046]
We propose a new approach to tune hyperparameters from offline logs of data. We first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model. We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.
arXiv Detail & Related papers (2022-05-18T04:26:23Z)
Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z)
Hyperparameter Tuning for Deep Reinforcement Learning Applications [0.3553493344868413]
We propose a distributed variable-length genetic algorithm framework to tune hyperparameters for various RL applications. Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment.
arXiv Detail & Related papers (2022-01-26T20:43:13Z)
Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser. It requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z)
Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL) In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z)
Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z)
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning [12.193817049957733]
Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. We propose an auto-tuning technique based on the Evidence Lower Bound (ELBO) for self-supervised reinforcement learning. Our method can auto-tune online and yields the best performance at a fraction of the time and computational resources.
arXiv Detail & Related papers (2020-10-16T08:58:24Z)
Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies [41.13416324282365]
We propose a framework which entails the application of Evolutionary Strategies to online hyper- parameter tuning in off-policy learning. Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces.
arXiv Detail & Related papers (2020-06-13T03:54:26Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.