Learning from Demonstrations using Signal Temporal Logic
- URL: http://arxiv.org/abs/2102.07730v1
- Date: Mon, 15 Feb 2021 18:28:36 GMT
- Title: Learning from Demonstrations using Signal Temporal Logic
- Authors: Aniruddh G. Puranic, Jyotirmoy V. Deshmukh and Stefanos Nikolaidis
- Abstract summary: Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies.
We use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
We show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.
- Score: 1.2182193687133713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-from-demonstrations is an emerging paradigm to obtain effective
robot control policies for complex tasks via reinforcement learning without the
need to explicitly design reward functions. However, it is susceptible to
imperfections in demonstrations and also raises concerns of safety and
interpretability in the learned control policies. To address these issues, we
use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
Temporal logic-based specifications allow us to create non-Markovian rewards,
and also define interesting causal dependencies between tasks such as
sequential task specifications. We validate our approach through experiments on
discrete-world and OpenAI Gym environments, and show that our approach
outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement
Learning.
Related papers
- Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Signal Temporal Logic-Guided Apprenticeship Learning [6.8500997328311]
We show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric.
We show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.
arXiv Detail & Related papers (2023-11-09T00:59:28Z) - Scaling In-Context Demonstrations with Structured Attention [75.41845145597875]
We propose a better architectural design for in-context learning.
Structured Attention for In-Context Learning replaces the full-attention by a structured attention mechanism.
We show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up.
arXiv Detail & Related papers (2023-07-05T23:26:01Z) - Exploiting Symmetry and Heuristic Demonstrations in Off-policy
Reinforcement Learning for Robotic Manipulation [1.7901837062462316]
This paper aims to define and incorporate the natural symmetry present in physical robotic environments.
The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle.
A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.
arXiv Detail & Related papers (2023-04-12T11:38:01Z) - Funnel-based Reward Shaping for Signal Temporal Logic Tasks in
Reinforcement Learning [0.0]
We propose a tractable reinforcement learning algorithm to learn a controller that enforces Signal Temporal Logic (STL) specifications.
We demonstrate the utility of our approach on several STL tasks using different environments.
arXiv Detail & Related papers (2022-11-30T19:38:21Z) - Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations.
We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning.
We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Foreseeing the Benefits of Incidental Supervision [83.08441990812636]
This paper studies whether we can, in a single framework, quantify the benefits of various types of incidental signals for a given target task without going through experiments.
We propose a unified PAC-Bayesian motivated informativeness measure, PABI, that characterizes the uncertainty reduction provided by incidental supervision signals.
arXiv Detail & Related papers (2020-06-09T20:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.