Plan Your Target and Learn Your Skills: Transferable State-Only
Imitation Learning via Decoupled Policy Optimization
- URL: http://arxiv.org/abs/2203.02214v1
- Date: Fri, 4 Mar 2022 09:46:29 GMT
- Title: Plan Your Target and Learn Your Skills: Transferable State-Only
Imitation Learning via Decoupled Policy Optimization
- Authors: Minghuan Liu, Zhengbang Zhu, Yuzheng Zhuang, Weinan Zhang, Jianye Hao,
Yong Yu, Jun Wang
- Abstract summary: We introduce Decoupled Policy Optimization (DePO), which explicitly decouples the policy as a high-level state planner and an inverse dynamics model.
With embedded decoupled policy gradient and generative adversarial training, DePO enables knowledge transfer to different action spaces or state transition dynamics.
- Score: 44.32548301913779
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent progress in state-only imitation learning extends the scope of
applicability of imitation learning to real-world settings by relieving the
need for observing expert actions. However, existing solutions only learn to
extract a state-to-action mapping policy from the data, without considering how
the expert plans to the target. This hinders the ability to leverage
demonstrations and limits the flexibility of the policy. In this paper, we
introduce Decoupled Policy Optimization (DePO), which explicitly decouples the
policy as a high-level state planner and an inverse dynamics model. With
embedded decoupled policy gradient and generative adversarial training, DePO
enables knowledge transfer to different action spaces or state transition
dynamics, and can generalize the planner to out-of-demonstration state regions.
Our in-depth experimental analysis shows the effectiveness of DePO on learning
a generalized target state planner while achieving the best imitation
performance. We demonstrate the appealing usage of DePO for transferring across
different tasks by pre-training, and the potential for co-training agents with
various skills.
Related papers
- Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning [53.9544543607396]
We propose a novel framework that integrates reward rendering with Imitation from Observation (IfO)
By instantiating F-distance in different ways, we derive two theoretical analysis and develop a practical algorithm called Accessible State Oriented Policy Regularization (ASOR)
ASOR serves as a general add-on module that can be incorporated into various approaches RL, including offline RL and off-policy RL.
arXiv Detail & Related papers (2025-03-10T03:50:20Z) - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.
EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.
Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z) - Time-Efficient Reinforcement Learning with Stochastic Stateful Policies [20.545058017790428]
We present a novel approach for training stateful policies by decomposing the latter into a gradient internal state kernel and a stateless policy.
We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning algorithms.
arXiv Detail & Related papers (2023-11-07T15:48:07Z) - IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task.
Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions.
We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Provable Representation Learning for Imitation with Contrastive Fourier
Features [27.74988221252854]
We consider using offline experience datasets to learn low-dimensional state representations.
A central challenge is that the unknown target policy itself may not exhibit low-dimensional behavior.
We derive a representation learning objective which provides an upper bound on the performance difference between the target policy and a lowdimensional policy trained with max-likelihood.
arXiv Detail & Related papers (2021-05-26T00:31:30Z) - Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning [9.014110264448371]
We propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM)
GPIM jointly learns both an abstract-level policy and a goal-conditioned policy.
Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method.
arXiv Detail & Related papers (2021-04-11T16:26:10Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.