Tractable Reinforcement Learning of Signal Temporal Logic Objectives
- URL: http://arxiv.org/abs/2001.09467v2
- Date: Mon, 17 Feb 2020 15:17:50 GMT
- Title: Tractable Reinforcement Learning of Signal Temporal Logic Objectives
- Authors: Harish Venkataraman, Derya Aksaray, Peter Seiler
- Abstract summary: Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications.
Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action.
We propose a compact means to capture state history in a new augmented state-space representation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Signal temporal logic (STL) is an expressive language to specify time-bound
real-world robotic tasks and safety specifications. Recently, there has been an
interest in learning optimal policies to satisfy STL specifications via
reinforcement learning (RL). Learning to satisfy STL specifications often needs
a sufficient length of state history to compute reward and the next action. The
need for history results in exponential state-space growth for the learning
problem. Thus the learning problem becomes computationally intractable for most
real-world applications. In this paper, we propose a compact means to capture
state history in a new augmented state-space representation. An approximation
to the objective (maximizing probability of satisfaction) is proposed and
solved for in the new augmented state-space. We show the performance bound of
the approximate solution and compare it with the solution of an existing
technique via simulations.
Related papers
- State Chrono Representation for Enhancing Generalization in Reinforcement Learning [36.12688166503104]
In reinforcement learning with image-based inputs, it is crucial to establish a robust and generalizable state representation.
We propose a novel State Chrono Representation (SCR) approach to address these challenges.
SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning.
arXiv Detail & Related papers (2024-11-09T13:12:34Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - State Sequences Prediction via Fourier Transform for Representation
Learning [111.82376793413746]
We propose State Sequences Prediction via Fourier Transform (SPF), a novel method for learning expressive representations efficiently.
We theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity.
Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
arXiv Detail & Related papers (2023-10-24T14:47:02Z) - Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration.
We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z) - Funnel-based Reward Shaping for Signal Temporal Logic Tasks in
Reinforcement Learning [0.0]
We propose a tractable reinforcement learning algorithm to learn a controller that enforces Signal Temporal Logic (STL) specifications.
We demonstrate the utility of our approach on several STL tasks using different environments.
arXiv Detail & Related papers (2022-11-30T19:38:21Z) - Temporal Feature Alignment in Contrastive Self-Supervised Learning for
Human Activity Recognition [2.2082422928825136]
Self-supervised learning is typically used to learn deep feature representations from unlabeled data.
We propose integrating a dynamic time warping algorithm in a latent space to force features to be aligned in a temporal dimension.
The proposed approach has a great potential in learning robust feature representations compared to the recent SSL baselines.
arXiv Detail & Related papers (2022-10-07T07:51:01Z) - Learning Signal Temporal Logic through Neural Network for Interpretable
Classification [13.829082181692872]
We propose an explainable neural-symbolic framework for the classification of time-series behaviors.
We demonstrate the computational efficiency, compactness, and interpretability of the proposed method through driving scenarios and naval surveillance case studies.
arXiv Detail & Related papers (2022-10-04T21:11:54Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Continuous Motion Planning with Temporal Logic Specifications using Deep
Neural Networks [16.296473750342464]
We propose a model-free reinforcement learning method to synthesize control policies for motion planning problems.
The robot is modelled as a discrete Markovtime decision process (MDP) with continuous state and action spaces.
We train deep neural networks to approximate the value function and policy using an actorcritic reinforcement learning method.
arXiv Detail & Related papers (2020-04-02T17:58:03Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.