Provable Hierarchy-Based Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2110.09507v1
- Date: Mon, 18 Oct 2021 17:56:02 GMT
- Title: Provable Hierarchy-Based Meta-Reinforcement Learning
- Authors: Kurtland Chua, Qi Lei, Jason D. Lee
- Abstract summary: We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
- Score: 50.17896588738377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical reinforcement learning (HRL) has seen widespread interest as an
approach to tractable learning of complex modular behaviors. However, existing
work either assume access to expert-constructed hierarchies, or use
hierarchy-learning heuristics with no provable guarantees. To address this gap,
we analyze HRL in the meta-RL setting, where a learner learns latent
hierarchical structure during meta-training for use in a downstream task. We
consider a tabular setting where natural hierarchical structure is embedded in
the transition dynamics. Analogous to supervised meta-learning theory, we
provide "diversity conditions" which, together with a tractable optimism-based
algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Furthermore, we provide regret bounds on a learner using the recovered
hierarchy to solve a meta-test task. Our bounds incorporate common notions in
HRL literature such as temporal and state/action abstractions, suggesting that
our setting and analysis capture important features of HRL in practice.
Related papers
- Reinforcement Learning with Options and State Representation [105.82346211739433]
This thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones.
It addresses such goals by decomposing learning tasks in a hierarchical fashion known as Hierarchical Reinforcement Learning.
arXiv Detail & Related papers (2024-03-16T08:30:55Z) - Towards an Information Theoretic Framework of Context-Based Offline
Meta-Reinforcement Learning [50.976910714839065]
Context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations.
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $boldsymbolM$ and its latent representation $boldsymbolZ$ by implementing various approximate bounds.
Based on the theoretical insight and the information bottleneck principle, we arrive at a novel algorithm dubbed UNICORN, which exhibits remarkable generalization across a broad spectrum of RL benchmarks.
arXiv Detail & Related papers (2024-02-04T09:58:42Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning [25.84621883831624]
Hierarchical reinforcement learning has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration.
We present primitive enabled adaptive relabeling (PEAR)
We first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision.
We then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL)
arXiv Detail & Related papers (2023-06-10T09:41:30Z) - Causality-driven Hierarchical Structure Discovery for Reinforcement
Learning [36.03953383550469]
We propose CDHRL, a causality-driven hierarchical reinforcement learning framework.
We show that CDHRL significantly boosts exploration efficiency with the causality-driven paradigm.
The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL significantly boosts exploration efficiency with the causality-driven paradigm.
arXiv Detail & Related papers (2022-10-13T12:42:48Z) - Weakly-supervised Action Localization via Hierarchical Mining [76.00021423700497]
Weakly-supervised action localization aims to localize and classify action instances in the given videos temporally with only video-level categorical labels.
We propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining.
We show that HiM-Net outperforms existing methods on THUMOS14 and ActivityNet1.3 datasets with large margins by hierarchically mining the supervision and consistency.
arXiv Detail & Related papers (2022-06-22T12:19:09Z) - On Credit Assignment in Hierarchical Reinforcement Learning [0.0]
Hierarchical Reinforcement Learning (HRL) has held longstanding promise to advance reinforcement learning.
We show how e.g., a 1-step hierarchical backup' can be seen as a conventional multistep backup with $n$ skip connections over time.
We develop a new hierarchical algorithm Hier$Q_k(lambda)$, for which we demonstrate that hierarchical credit assignment alone can already boost agent performance.
arXiv Detail & Related papers (2022-03-07T11:13:09Z) - Alchemy: A structured task distribution for meta-reinforcement learning [52.75769317355963]
We introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency.
Alchemy is a 3D video game, which involves a latent causal structure that is resampled procedurally from episode to episode.
We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents.
arXiv Detail & Related papers (2021-02-04T23:40:44Z) - Learning Functionally Decomposed Hierarchies for Continuous Control
Tasks with Path Planning [36.050432925402845]
We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks.
We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods.
arXiv Detail & Related papers (2020-02-14T10:19:52Z) - Temporal-adaptive Hierarchical Reinforcement Learning [7.571460904033682]
Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning.
We propose the emphtemporal-adaptive hierarchical policy learning (TEMPLE) structure, which uses a temporal gate to adaptively control the high-level policy decision frequency.
We train the TEMPLE structure with PPO and test its performance in a range of environments including 2-D rooms, Mujoco tasks, and Atari games.
arXiv Detail & Related papers (2020-02-06T02:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.