SHIRO: Soft Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2212.12786v1
- Date: Sat, 24 Dec 2022 17:21:58 GMT
- Title: SHIRO: Soft Hierarchical Reinforcement Learning
- Authors: Kandai Watanabe, Mathew Strong, Omer Eldar
- Abstract summary: We present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration.
The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level.
Our method, SHIRO, surpasses state-of-the-art performance on a range of simulated robotic control benchmark tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical Reinforcement Learning (HRL) algorithms have been demonstrated
to perform well on high-dimensional decision making and robotic control tasks.
However, because they solely optimize for rewards, the agent tends to search
the same space redundantly. This problem reduces the speed of learning and
achieved reward. In this work, we present an Off-Policy HRL algorithm that
maximizes entropy for efficient exploration. The algorithm learns a temporally
abstracted low-level policy and is able to explore broadly through the addition
of entropy to the high-level. The novelty of this work is the theoretical
motivation of adding entropy to the RL objective in the HRL setting. We
empirically show that the entropy can be added to both levels if the
Kullback-Leibler (KL) divergence between consecutive updates of the low-level
policy is sufficiently small. We performed an ablative study to analyze the
effects of entropy on hierarchy, in which adding entropy to high-level emerged
as the most desirable configuration. Furthermore, a higher temperature in the
low-level leads to Q-value overestimation and increases the stochasticity of
the environment that the high-level operates on, making learning more
challenging. Our method, SHIRO, surpasses state-of-the-art performance on a
range of simulated robotic control benchmark tasks and requires minimal tuning.
Related papers
- Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL)
HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks.
Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning [25.84621883831624]
We present CRISP, a novel HRL algorithm that generates a curriculum of achievable subgoals for evolving lower-level primitives.
CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations.
We demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
arXiv Detail & Related papers (2023-04-07T08:22:50Z) - Dealing with Sparse Rewards in Continuous Control Robotics via
Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - A Max-Min Entropy Framework for Reinforcement Learning [16.853711292804476]
We propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the maximum entropy RL framework.
For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework.
Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.
arXiv Detail & Related papers (2021-06-19T15:30:21Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.