Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
- URL: http://arxiv.org/abs/2010.08252v4
- Date: Thu, 25 Mar 2021 03:30:59 GMT
- Title: Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
- Authors: Jiancong Huang, Juan Rojas, Matthieu Zimmer, Hongmin Wu, Yisheng Guan,
and Paul Weng
- Abstract summary: Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources.
We propose an auto-tuning technique based on the Evidence Lower Bound (ELBO) for self-supervised reinforcement learning.
Our method can auto-tune online and yields the best performance at a fraction of the time and computational resources.
- Score: 12.193817049957733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy optimization in reinforcement learning requires the selection of
numerous hyperparameters across different environments. Fixing them incorrectly
may negatively impact optimization performance leading notably to insufficient
or redundant learning. Insufficient learning (due to convergence to local
optima) results in under-performing policies whilst redundant learning wastes
time and resources. The effects are further exacerbated when using single
policies to solve multi-task learning problems. Observing that the Evidence
Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the
diversity of image samples, we propose an auto-tuning technique based on the
ELBO for self-supervised reinforcement learning. Our approach can auto-tune
three hyperparameters: the replay buffer size, the number of policy gradient
updates during each epoch, and the number of exploration steps during each
epoch. We use a state-of-the-art self-supervised robot learning framework
(Reinforcement Learning with Imagined Goals (RIG) using Soft Actor-Critic) as
baseline for experimental verification. Experiments show that our method can
auto-tune online and yields the best performance at a fraction of the time and
computational resources. Code, video, and appendix for simulated and real-robot
experiments can be found at the project page \url{www.JuanRojas.net/autotune}.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in
Reinforcement Learning [0.38073142980732994]
Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model.
We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set.
Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
arXiv Detail & Related papers (2022-09-29T19:57:43Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - Episodic Self-Imitation Learning with Hindsight [7.743320290728377]
Episodic self-imitation learning is a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function.
A selection module is introduced to filter uninformative samples from each episode of the update.
Episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces.
arXiv Detail & Related papers (2020-11-26T20:36:42Z) - Deep Surrogate Q-Learning for Autonomous Driving [17.30342128504405]
We propose Surrogate Q-learning for learning lane-change behavior for autonomous driving.
We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay.
We also show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.
arXiv Detail & Related papers (2020-10-21T19:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.