Related papers: Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

URL: http://arxiv.org/abs/2002.05615v1
Date: Thu, 13 Feb 2020 16:38:38 GMT
Title: Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints
Authors: Steven Carr, Nils Jansen and Ufuk Topcu
Abstract summary: A major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller from an RNN.
Score: 31.829932777445894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.

Related papers

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks [15.331024247043999]
Recurrent Neural Networks (RNNs) have achieved tremendous success in processing sequential data, yet understanding and analyzing their behaviours remains a significant challenge. We propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.
arXiv Detail & Related papers (2023-06-24T19:16:56Z)
Learning Stochastic Parametric Differentiable Predictive Control Policies [2.042924346801313]
We present a scalable alternative called parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies. SP-DPC is formulated as a deterministic approximation to the parametric constrained optimal control problem. We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop constraints and chance satisfaction.
arXiv Detail & Related papers (2022-03-02T22:46:32Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
Chance-Constrained Control with Lexicographic Deep Reinforcement Learning [77.34726150561087]
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes. A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations.
arXiv Detail & Related papers (2020-10-19T13:09:14Z)
Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z)
Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes [36.07746952116073]
We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP) We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
arXiv Detail & Related papers (2020-01-11T23:09:25Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.