Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2104.05043v1
- Date: Sun, 11 Apr 2021 16:26:10 GMT
- Title: Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning
- Authors: Jinxin Liu, Donglin Wang, Qiangxing Tian, Zhengyu Chen
- Abstract summary: We propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM)
GPIM jointly learns both an abstract-level policy and a goal-conditioned policy.
Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method.
- Score: 9.014110264448371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is of significance for an agent to learn a widely applicable and
general-purpose policy that can achieve diverse goals including images and text
descriptions. Considering such perceptually-specific goals, the frontier of
deep reinforcement learning research is to learn a goal-conditioned policy
without hand-crafted rewards. To learn this kind of policy, recent works
usually take as the reward the non-parametric distance to a given goal in an
explicit embedding space. From a different viewpoint, we propose a novel
unsupervised learning approach named goal-conditioned policy with intrinsic
motivation (GPIM), which jointly learns both an abstract-level policy and a
goal-conditioned policy. The abstract-level policy is conditioned on a latent
variable to optimize a discriminator and discovers diverse states that are
further rendered into perceptually-specific goals for the goal-conditioned
policy. The learned discriminator serves as an intrinsic reward function for
the goal-conditioned policy to imitate the trajectory induced by the
abstract-level policy. Experiments on various robotic tasks demonstrate the
effectiveness and efficiency of our proposed GPIM method which substantially
outperforms prior techniques.
Related papers
- Learning Control Policies for Variable Objectives from Offline Data [2.7174376960271154]
We introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP)
We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime.
arXiv Detail & Related papers (2023-08-11T13:33:59Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Provable Representation Learning for Imitation with Contrastive Fourier
Features [27.74988221252854]
We consider using offline experience datasets to learn low-dimensional state representations.
A central challenge is that the unknown target policy itself may not exhibit low-dimensional behavior.
We derive a representation learning objective which provides an upper bound on the performance difference between the target policy and a lowdimensional policy trained with max-likelihood.
arXiv Detail & Related papers (2021-05-26T00:31:30Z) - Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [116.804536884437]
We propose an opposite behavior aware framework for policy learning in goal-oriented dialogues.
We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy.
arXiv Detail & Related papers (2020-04-21T03:13:44Z) - Off-Policy Deep Reinforcement Learning with Analogous Disentangled
Exploration [33.25932244741268]
Off-policy reinforcement learning (RL) is concerned with learning a rewarding policy by executing another policy that gathers samples of experience.
While the former policy is rewarding but in-expressive (in most cases, deterministic), doing well in the latter task, in contrast, requires an expressive policy that offers guided and effective exploration.
We propose Analogous Disentangled Actor-Critic (ADAC) to mitigate this problem.
arXiv Detail & Related papers (2020-02-25T08:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.