Signal Temporal Logic-Guided Apprenticeship Learning
- URL: http://arxiv.org/abs/2311.05084v1
- Date: Thu, 9 Nov 2023 00:59:28 GMT
- Title: Signal Temporal Logic-Guided Apprenticeship Learning
- Authors: Aniruddh G. Puranic, Jyotirmoy V. Deshmukh and Stefanos Nikolaidis
- Abstract summary: We show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric.
We show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.
- Score: 6.8500997328311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Apprenticeship learning crucially depends on effectively learning rewards,
and hence control policies from user demonstrations. Of particular difficulty
is the setting where the desired task consists of a number of sub-goals with
temporal dependencies. The quality of inferred rewards and hence policies are
typically limited by the quality of demonstrations, and poor inference of these
can lead to undesirable outcomes. In this letter, we show how temporal logic
specifications that describe high level task objectives, are encoded in a graph
to define a temporal-based metric that reasons about behaviors of demonstrators
and the learner agent to improve the quality of inferred rewards and policies.
Through experiments on a diverse set of robot manipulator simulations, we show
how our framework overcomes the drawbacks of prior literature by drastically
improving the number of demonstrations required to learn a control policy.
Related papers
- Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from
Demonstrations [15.762916270583698]
Learning from demonstration (LfD) methods have shown promise for solving multi-step tasks.
In this work, we identify the roots of such a challenge as the failure of the learned continuous policy to satisfy the discrete plan implicit in the demonstration.
We prove our learned continuous policy can simulate any discrete plan specified by a Linear Temporal Logic (LTL) formula.
arXiv Detail & Related papers (2022-06-09T17:25:22Z) - Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs)
We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Learning from Demonstrations using Signal Temporal Logic [1.2182193687133713]
Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies.
We use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
We show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.
arXiv Detail & Related papers (2021-02-15T18:28:36Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.