CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2304.03535v5
- Date: Tue, 24 Sep 2024 17:23:48 GMT
- Title: CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning
- Authors: Utsav Singh, Vinay P. Namboodiri,
- Abstract summary: We present CRISP, a novel HRL algorithm that generates a curriculum of achievable subgoals for evolving lower-level primitives.
CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations.
We demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
- Score: 25.84621883831624
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hierarchical reinforcement learning (HRL) is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we present CRISP, a novel HRL algorithm that effectively generates a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations, using a novel primitive informed parsing (PIP) approach, thereby mitigating non-stationarity. Since our approach only assumes access to a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluations on complex robotic maze navigation and robotic manipulation tasks demonstrate that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks. Additionally, we perform real world robotic experiments on complex manipulation tasks and demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
Related papers
- DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning [36.50275602760051]
We introduce DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning.
It is an efficient hierarchical approach that leverages direct preference optimization to learn a higher-level policy and reinforcement learning to learn a lower-level policy.
It enjoys improved computational efficiency due to its use of direct preference optimization instead of standard preference-based approaches.
arXiv Detail & Related papers (2024-06-16T10:49:41Z) - LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning [22.99690700210957]
We propose a novel HRL framework that leverages language instructions to generate a stationary reward function for a higher-level policy.
Since the language-guided reward is unaffected by the lower primitive behaviour, LGR2 mitigates non-stationarity.
Our approach attains success rates exceeding 70$%$ in challenging, sparse-reward robotic navigation and manipulation environments.
arXiv Detail & Related papers (2024-06-09T18:40:24Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Human-Inspired Framework to Accelerate Reinforcement Learning [1.6317061277457001]
Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency.
This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency.
arXiv Detail & Related papers (2023-02-28T13:15:04Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.