Verifiable RNN-Based Policies for POMDPs Under Temporal Logic
Constraints
- URL: http://arxiv.org/abs/2002.05615v1
- Date: Thu, 13 Feb 2020 16:38:38 GMT
- Title: Verifiable RNN-Based Policies for POMDPs Under Temporal Logic
Constraints
- Authors: Steven Carr, Nils Jansen and Ufuk Topcu
- Abstract summary: A major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications.
By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller from an RNN.
- Score: 31.829932777445894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) have emerged as an effective representation
of control policies in sequential decision-making problems. However, a major
drawback in the application of RNN-based policies is the difficulty in
providing formal guarantees on the satisfaction of behavioral specifications,
e.g. safety and/or reachability. By integrating techniques from formal methods
and machine learning, we propose an approach to automatically extract a
finite-state controller (FSC) from an RNN, which, when composed with a
finite-state system model, is amenable to existing formal verification tools.
Specifically, we introduce an iterative modification to the so-called quantized
bottleneck insertion technique to create an FSC as a randomized policy with
memory. For the cases in which the resulting FSC fails to satisfy the
specification, verification generates diagnostic information. We utilize this
information to either adjust the amount of memory in the extracted FSC or
perform focused retraining of the RNN. While generally applicable, we detail
the resulting iterative procedure in the context of policy synthesis for
partially observable Markov decision processes (POMDPs), which is known to be
notoriously hard. The numerical experiments show that the proposed approach
outperforms traditional POMDP synthesis methods by 3 orders of magnitude within
2% of optimal benchmark values.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Weighted Automata Extraction and Explanation of Recurrent Neural
Networks for Natural Language Tasks [15.331024247043999]
Recurrent Neural Networks (RNNs) have achieved tremendous success in processing sequential data, yet understanding and analyzing their behaviours remains a significant challenge.
We propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.
arXiv Detail & Related papers (2023-06-24T19:16:56Z) - Learning Stochastic Parametric Differentiable Predictive Control
Policies [2.042924346801313]
We present a scalable alternative called parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies.
SP-DPC is formulated as a deterministic approximation to the parametric constrained optimal control problem.
We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop constraints and chance satisfaction.
arXiv Detail & Related papers (2022-03-02T22:46:32Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Chance-Constrained Control with Lexicographic Deep Reinforcement
Learning [77.34726150561087]
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes.
A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations.
arXiv Detail & Related papers (2020-10-19T13:09:14Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - Point-Based Methods for Model Checking in Partially Observable Markov
Decision Processes [36.07746952116073]
We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP)
We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula.
We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
arXiv Detail & Related papers (2020-01-11T23:09:25Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.