Discovery of Options via Meta-Learned Subgoals
- URL: http://arxiv.org/abs/2102.06741v1
- Date: Fri, 12 Feb 2021 19:50:40 GMT
- Title: Discovery of Options via Meta-Learned Subgoals
- Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh,
Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh
- Abstract summary: Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
We introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments.
- Score: 59.2160583043938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal abstractions in the form of options have been shown to help
reinforcement learning (RL) agents learn faster. However, despite prior work on
this topic, the problem of discovering options through interaction with an
environment remains a challenge. In this paper, we introduce a novel
meta-gradient approach for discovering useful options in multi-task RL
environments. Our approach is based on a manager-worker decomposition of the RL
agent, in which a manager maximises rewards from the environment by learning a
task-dependent policy over both a set of task-independent discovered-options
and primitive actions. The option-reward and termination functions that define
a subgoal for each option are parameterised as neural networks and trained via
meta-gradients to maximise their usefulness. Empirical analysis on gridworld
and DeepMind Lab tasks show that: (1) our approach can discover meaningful and
diverse temporally-extended options in multi-task RL domains, (2) the
discovered options are frequently used by the agent while learning to solve the
training tasks, and (3) that the discovered options help a randomly initialised
manager learn faster in completely new tasks.
Related papers
- Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Matching options to tasks using Option-Indexed Hierarchical
Reinforcement Learning [20.85397773933171]
We propose a novel option indexing approach to hierarchical learning (OI-HRL)
This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time.
We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems.
arXiv Detail & Related papers (2022-06-12T14:39:02Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.