Stochastic Finite State Control of POMDPs with LTL Specifications
- URL: http://arxiv.org/abs/2001.07679v1
- Date: Tue, 21 Jan 2020 18:10:47 GMT
- Title: Stochastic Finite State Control of POMDPs with LTL Specifications
- Authors: Mohamadreza Ahmadi, Rangoli Sharan, and Joel W. Burdick
- Abstract summary: Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty.
This paper considers the quantitative problem of synthesizing sub-optimal finite state controllers (sFSCs) for POMDPs.
We propose a bounded policy algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations.
- Score: 14.163899014007647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partially observable Markov decision processes (POMDPs) provide a modeling
framework for autonomous decision making under uncertainty and imperfect
sensing, e.g. robot manipulation and self-driving cars. However, optimal
control of POMDPs is notoriously intractable. This paper considers the
quantitative problem of synthesizing sub-optimal stochastic finite state
controllers (sFSCs) for POMDPs such that the probability of satisfying a set of
high-level specifications in terms of linear temporal logic (LTL) formulae is
maximized. We begin by casting the latter problem into an optimization and use
relaxations based on the Poisson equation and McCormick envelopes. Then, we
propose an stochastic bounded policy iteration algorithm, leading to a
controlled growth in sFSC size and an any time algorithm, where the performance
of the controller improves with successive iterations, but can be stopped by
the user based on time or memory considerations. We illustrate the proposed
method by a robot navigation case study.
Related papers
- Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes [1.445706856497821]
This work defines an MDP framework, the textttSD-MDP, where we disentangle the causal structure of MDPs' transition and reward dynamics.
We derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-23T16:22:40Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Stability-informed Bayesian Optimization for MPC Cost Function Learning [5.643541009427271]
This work explores closed-loop learning for predictive control parameters under imperfect information.
We employ constrained Bayesian optimization to learn a model predictive controller's (MPC) cost function parametrized as a feedforward neural network.
We extend this framework by stability constraints on the learned controller parameters, exploiting the optimal value function of the underlying MPC as a Lyapunov candidate.
arXiv Detail & Related papers (2024-04-18T13:49:09Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Formal Controller Synthesis for Markov Jump Linear Systems with
Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems.
Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS.
We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z) - Learning Stochastic Parametric Differentiable Predictive Control
Policies [2.042924346801313]
We present a scalable alternative called parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies.
SP-DPC is formulated as a deterministic approximation to the parametric constrained optimal control problem.
We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop constraints and chance satisfaction.
arXiv Detail & Related papers (2022-03-02T22:46:32Z) - Neural Predictive Control for the Optimization of Smart Grid Flexibility
Schedules [0.0]
Model predictive control (MPC) is a method to formulate the optimal scheduling problem for grid flexibilities in a mathematical manner.
MPC methods promise accurate results for time-constrained grid optimization but they are inherently limited by the calculation time needed for large and complex power system models.
A Neural Predictive Control scheme is proposed to learn optimal control policies for linear and nonlinear power systems through imitation.
arXiv Detail & Related papers (2021-08-19T15:12:35Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Reinforcement Learning Based Temporal Logic Control with Soft
Constraints Using Limit-deterministic Generalized Buchi Automata [0.0]
We study the control synthesis of motion planning subject to uncertainties.
The uncertainties are considered in robot motion and environment properties, giving rise to the probabilistic labeled Markov decision process (MDP)
arXiv Detail & Related papers (2021-01-25T18:09:11Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.