Goal-Conditioned Reinforcement Learning with Imagined Subgoals
- URL: http://arxiv.org/abs/2107.00541v1
- Date: Thu, 1 Jul 2021 15:30:59 GMT
- Title: Goal-Conditioned Reinforcement Learning with Imagined Subgoals
- Authors: Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev
- Abstract summary: We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
- Score: 89.67840168694259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goal-conditioned reinforcement learning endows an agent with a large variety
of skills, but it often struggles to solve tasks that require more temporally
extended reasoning. In this work, we propose to incorporate imagined subgoals
into policy learning to facilitate learning of complex tasks. Imagined subgoals
are predicted by a separate high-level policy, which is trained simultaneously
with the policy and its critic. This high-level policy predicts intermediate
states halfway to the goal using the value function as a reachability metric.
We don't require the policy to reach these subgoals explicitly. Instead, we use
them to define a prior policy, and incorporate this prior into a KL-constrained
policy iteration scheme to speed up and regularize learning. Imagined subgoals
are used during policy learning, but not during test time, where we only apply
the learned policy. We evaluate our approach on complex robotic navigation and
manipulation tasks and show that it outperforms existing methods by a large
margin.
Related papers
- Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Interpretable Reinforcement Learning with Multilevel Subgoal Discovery [77.34726150561087]
We propose a novel Reinforcement Learning model for discrete environments.
In the model, an agent learns information about environment in the form of probabilistic rules.
No reward function is required for learning; an agent only needs to be given a primary goal to achieve.
arXiv Detail & Related papers (2022-02-15T14:04:44Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning [9.014110264448371]
We propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM)
GPIM jointly learns both an abstract-level policy and a goal-conditioned policy.
Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method.
arXiv Detail & Related papers (2021-04-11T16:26:10Z) - Lifelong Policy Gradient Learning of Factored Policies for Faster
Training Without Forgetting [26.13332231423652]
We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients.
We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines.
arXiv Detail & Related papers (2020-07-14T13:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.