Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from
Demonstrations
- URL: http://arxiv.org/abs/2206.04632v1
- Date: Thu, 9 Jun 2022 17:25:22 GMT
- Title: Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from
Demonstrations
- Authors: Yanwei Wang, Nadia Figueroa, Shen Li, Ankit Shah, Julie Shah
- Abstract summary: Learning from demonstration (LfD) methods have shown promise for solving multi-step tasks.
In this work, we identify the roots of such a challenge as the failure of the learned continuous policy to satisfy the discrete plan implicit in the demonstration.
We prove our learned continuous policy can simulate any discrete plan specified by a Linear Temporal Logic (LTL) formula.
- Score: 15.762916270583698
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning from demonstration (LfD) methods have shown promise for solving
multi-step tasks; however, these approaches do not guarantee successful
reproduction of the task given disturbances. In this work, we identify the
roots of such a challenge as the failure of the learned continuous policy to
satisfy the discrete plan implicit in the demonstration. By utilizing modes
(rather than subgoals) as the discrete abstraction and motion policies with
both mode invariance and goal reachability properties, we prove our learned
continuous policy can simulate any discrete plan specified by a Linear Temporal
Logic (LTL) formula. Consequently, the imitator is robust to both task- and
motion-level disturbances and guaranteed to achieve task success. Project page:
https://sites.google.com/view/ltl-ds
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Signal Temporal Logic-Guided Apprenticeship Learning [6.8500997328311]
We show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric.
We show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.
arXiv Detail & Related papers (2023-11-09T00:59:28Z) - Task Phasing: Automated Curriculum Learning from Demonstrations [46.1680279122598]
Applying reinforcement learning to sparse reward domains is notoriously challenging due to insufficient guiding signals.
This paper introduces a principled task phasing approach that uses demonstrations to automatically generate a curriculum sequence.
Experimental results on 3 sparse reward domains demonstrate that our task phasing approaches outperform state-of-the-art approaches with respect to performance.
arXiv Detail & Related papers (2022-10-20T03:59:11Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Provable Representation Learning for Imitation with Contrastive Fourier
Features [27.74988221252854]
We consider using offline experience datasets to learn low-dimensional state representations.
A central challenge is that the unknown target policy itself may not exhibit low-dimensional behavior.
We derive a representation learning objective which provides an upper bound on the performance difference between the target policy and a lowdimensional policy trained with max-likelihood.
arXiv Detail & Related papers (2021-05-26T00:31:30Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Learning from Demonstrations using Signal Temporal Logic [1.2182193687133713]
Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies.
We use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
We show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.
arXiv Detail & Related papers (2021-02-15T18:28:36Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.