Temporal-adaptive Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2002.02080v1
- Date: Thu, 6 Feb 2020 02:52:21 GMT
- Title: Temporal-adaptive Hierarchical Reinforcement Learning
- Authors: Wen-Ji Zhou, Yang Yu
- Abstract summary: Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning.
We propose the emphtemporal-adaptive hierarchical policy learning (TEMPLE) structure, which uses a temporal gate to adaptively control the high-level policy decision frequency.
We train the TEMPLE structure with PPO and test its performance in a range of environments including 2-D rooms, Mujoco tasks, and Atari games.
- Score: 7.571460904033682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical reinforcement learning (HRL) helps address large-scale and
sparse reward issues in reinforcement learning. In HRL, the policy model has an
inner representation structured in levels. With this structure, the
reinforcement learning task is expected to be decomposed into corresponding
levels with sub-tasks, and thus the learning can be more efficient. In HRL,
although it is intuitive that a high-level policy only needs to make macro
decisions in a low frequency, the exact frequency is hard to be simply
determined. Previous HRL approaches often employed a fixed-time skip strategy
or learn a terminal condition without taking account of the context, which,
however, not only requires manual adjustments but also sacrifices some decision
granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical
policy learning} (TEMPLE) structure, which uses a temporal gate to adaptively
control the high-level policy decision frequency. We train the TEMPLE structure
with PPO and test its performance in a range of environments including 2-D
rooms, Mujoco tasks, and Atari games. The results show that the TEMPLE
structure can lead to improved performance in these environments with a
sequential adaptive high-level control.
Related papers
- Hierarchical Continual Reinforcement Learning via Large Language Model [15.837883929274758]
Hi-Core is designed to facilitate the transfer of high-level knowledge.
It orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM)
Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.
arXiv Detail & Related papers (2024-01-25T03:06:51Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning [48.75878234995544]
We propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection.
We validate Skill-Critic in multiple sparse-reward environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport.
arXiv Detail & Related papers (2023-06-14T09:24:32Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z) - Emergency action termination for immediate reaction in hierarchical
reinforcement learning [8.637919344171255]
We propose a method in which the validity of higher-level actions (thus lower-level goals) is constantly verified at the higher level.
If the actions, i.e. lower level goals, become inadequate, they are replaced by more appropriate ones.
This way we combine the advantages of hierarchical RL, which is fast training, and flat RL, which is immediate reactivity.
arXiv Detail & Related papers (2022-11-11T16:56:02Z) - Adversarially Guided Subgoal Generation for Hierarchical Reinforcement
Learning [5.514236598436977]
We propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy.
Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.
arXiv Detail & Related papers (2022-01-24T12:30:38Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.