Signal Temporal Logic-Guided Apprenticeship Learning
- URL: http://arxiv.org/abs/2311.05084v1
- Date: Thu, 9 Nov 2023 00:59:28 GMT
- Title: Signal Temporal Logic-Guided Apprenticeship Learning
- Authors: Aniruddh G. Puranic, Jyotirmoy V. Deshmukh and Stefanos Nikolaidis
- Abstract summary: We show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric.
We show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.
- Score: 6.8500997328311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Apprenticeship learning crucially depends on effectively learning rewards,
and hence control policies from user demonstrations. Of particular difficulty
is the setting where the desired task consists of a number of sub-goals with
temporal dependencies. The quality of inferred rewards and hence policies are
typically limited by the quality of demonstrations, and poor inference of these
can lead to undesirable outcomes. In this letter, we show how temporal logic
specifications that describe high level task objectives, are encoded in a graph
to define a temporal-based metric that reasons about behaviors of demonstrators
and the learner agent to improve the quality of inferred rewards and policies.
Through experiments on a diverse set of robot manipulator simulations, we show
how our framework overcomes the drawbacks of prior literature by drastically
improving the number of demonstrations required to learn a control policy.
Related papers
- Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.
We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from
Demonstrations [15.762916270583698]
Learning from demonstration (LfD) methods have shown promise for solving multi-step tasks.
In this work, we identify the roots of such a challenge as the failure of the learned continuous policy to satisfy the discrete plan implicit in the demonstration.
We prove our learned continuous policy can simulate any discrete plan specified by a Linear Temporal Logic (LTL) formula.
arXiv Detail & Related papers (2022-06-09T17:25:22Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Learning from Demonstrations using Signal Temporal Logic [1.2182193687133713]
Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies.
We use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
We show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.
arXiv Detail & Related papers (2021-02-15T18:28:36Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.