Matching options to tasks using Option-Indexed Hierarchical
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.05750v1
- Date: Sun, 12 Jun 2022 14:39:02 GMT
- Title: Matching options to tasks using Option-Indexed Hierarchical
Reinforcement Learning
- Authors: Kushal Chauhan, Soumya Chatterjee, Akash Reddy, Balaraman Ravindran,
Pradeep Shenoy
- Abstract summary: We propose a novel option indexing approach to hierarchical learning (OI-HRL)
This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time.
We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems.
- Score: 20.85397773933171
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The options framework in Hierarchical Reinforcement Learning breaks down
overall goals into a combination of options or simpler tasks and associated
policies, allowing for abstraction in the action space. Ideally, these options
can be reused across different higher-level goals; indeed, such reuse is
necessary to realize the vision of a continual learning agent that can
effectively leverage its prior experience. Previous approaches have only
proposed limited forms of transfer of prelearned options to new task settings.
We propose a novel option indexing approach to hierarchical learning (OI-HRL),
where we learn an affinity function between options and the items present in
the environment. This allows us to effectively reuse a large library of
pretrained options, in zero-shot generalization at test time, by restricting
goal-directed learning to only those options relevant to the task at hand. We
develop a meta-training loop that learns the representations of options and
environments over a series of HRL problems, by incorporating feedback about the
relevance of retrieved options to the higher-level goal. We evaluate OI-HRL in
two simulated settings - the CraftWorld and AI2THOR environments - and show
that we achieve performance competitive with oracular baselines, and
substantial gains over a baseline that has the entire option pool available for
learning the hierarchical policy.
Related papers
- Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences.
Existing methods work by emulating the preferences at the single decision (turn) level.
We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z) - Optimistic Linear Support and Successor Features as a Basis for Optimal
Policy Transfer [7.970144204429356]
We introduce an SF-based extension of the Optimistic Linear Support algorithm to learn a set of policies whose SFs form a convex coverage set.
We prove that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks.
arXiv Detail & Related papers (2022-06-22T19:00:08Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Discovery of Options via Meta-Learned Subgoals [59.2160583043938]
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
We introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments.
arXiv Detail & Related papers (2021-02-12T19:50:40Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.