PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2306.06394v5
- Date: Sun, 21 Apr 2024 12:57:38 GMT
- Title: PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning
- Authors: Utsav Singh, Vinay P. Namboodiri,
- Abstract summary: Hierarchical reinforcement learning has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration.
We present primitive enabled adaptive relabeling (PEAR)
We first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision.
We then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL)
- Score: 25.84621883831624
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hierarchical reinforcement learning (HRL) has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration. However, hierarchical agents are difficult to train due to inherent non-stationarity. We present primitive enabled adaptive relabeling (PEAR), a two-phase approach where we first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision, and then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL). We perform theoretical analysis to $(i)$ bound the sub-optimality of our approach, and $(ii)$ derive a generalized plug-and-play framework for joint optimization using RL and IL. Since PEAR utilizes only a handful of expert demonstrations and considers minimal limiting assumptions on the task structure, it can be easily integrated with typical off-policy RL algorithms to produce a practical HRL approach. We perform extensive experiments on challenging environments and show that PEAR is able to outperform various hierarchical and non-hierarchical baselines on complex tasks that require long term decision making. We also perform ablations to thoroughly analyse the importance of our various design choices. Finally, we perform real world robotic experiments on complex tasks and demonstrate that PEAR consistently outperforms the baselines.
Related papers
- RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning [50.55776190278426]
Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks.
We introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms.
arXiv Detail & Related papers (2024-05-29T22:23:20Z) - BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel
Optimization [34.24884427152513]
We propose a general meta ERL framework via bilevel optimization (BiERL)
We design an elegant meta-level architecture that embeds the inner-level's evolving experience into an informative population representation.
We perform extensive experiments in MuJoCo and Box2D tasks to verify that as a general framework, BiERL outperforms various baselines and consistently improves the learning performance for a diversity of ERL algorithms.
arXiv Detail & Related papers (2023-08-01T09:31:51Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning [25.84621883831624]
We present CRISP, a novel HRL algorithm that generates a curriculum of achievable subgoals for evolving lower-level primitives.
CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations.
We demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
arXiv Detail & Related papers (2023-04-07T08:22:50Z) - Hypernetworks in Meta-Reinforcement Learning [47.25270748922176]
Multi-task reinforcement learning (RL) and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks.
State of the art methods often fail to outperform a degenerate solution that simply learns each task separately.
Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution and are applicable to meta-RL.
arXiv Detail & Related papers (2022-10-20T15:34:52Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.