Controller Synthesis for Omega-Regular and Steady-State Specifications
- URL: http://arxiv.org/abs/2106.02951v1
- Date: Sat, 5 Jun 2021 19:34:22 GMT
- Title: Controller Synthesis for Omega-Regular and Steady-State Specifications
- Authors: Alvaro Velasquez, Ashutosh Trivedi, Ismail Alkhouri, Andre Beckus, and
George Atia
- Abstract summary: We present an algorithm to find a deterministic policy satisfying $omega$-regular and steady-state constraints.
We experimentally evaluate our approach.
- Score: 9.901800502055929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a Markov decision process (MDP) and a linear-time ($\omega$-regular or
LTL) specification, the controller synthesis problem aims to compute the
optimal policy that satisfies the specification. More recently, problems that
reason over the asymptotic behavior of systems have been proposed through the
lens of steady-state planning. This entails finding a control policy for an MDP
such that the Markov chain induced by the solution policy satisfies a given set
of constraints on its steady-state distribution. This paper studies a
generalization of the controller synthesis problem for a linear-time
specification under steady-state constraints on the asymptotic behavior. We
present an algorithm to find a deterministic policy satisfying $\omega$-regular
and steady-state constraints by characterizing the solutions as an integer
linear program, and experimentally evaluate our approach.
Related papers
- A Non-Asymptotic Theory of Seminorm Lyapunov Stability: From Deterministic to Stochastic Iterative Algorithms [15.764613607477887]
We study the problem of solving fixed-point equations for seminorm-contractive operators.
We establish the non-asymptotic behavior of iterative algorithms in both deterministic and foundational settings.
arXiv Detail & Related papers (2025-02-20T02:39:37Z) - A learning-based approach to stochastic optimal control under reach-avoid constraint [7.036452261968767]
We develop a model-free approach to optimally control, Markovian systems subject to a reach-avoid constraint.
We prove that under suitable assumptions, the policy parameters converge to the optimal parameters, while ensuring that the system trajectories satisfy the reach-avoid constraint with high probability.
arXiv Detail & Related papers (2024-12-21T10:07:40Z) - Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action [10.219627570276689]
We develop a framework for a class of Markov Decision Processes with general state and spaces.
We show that gradient methods converge to the globally optimal policy with a nonasymptomatic condition.
Our result establishes first complexity for multi-period inventory systems.
arXiv Detail & Related papers (2024-09-25T17:56:02Z) - Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear
Systems via Sums-of-Squares Optimization [28.627377507894003]
We present a method for noise-feedback, reduced-order output-of-control-perception policies for control observations of nonlinear systems.
We show that when these systems from images can fail to reliably stabilize, our approach can provide stability guarantees.
arXiv Detail & Related papers (2023-04-24T19:34:09Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Learning Stochastic Parametric Differentiable Predictive Control
Policies [2.042924346801313]
We present a scalable alternative called parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies.
SP-DPC is formulated as a deterministic approximation to the parametric constrained optimal control problem.
We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop constraints and chance satisfaction.
arXiv Detail & Related papers (2022-03-02T22:46:32Z) - Model-Free Reinforcement Learning for Optimal Control of MarkovDecision
Processes Under Signal Temporal Logic Specifications [7.842869080999489]
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process.
We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.
arXiv Detail & Related papers (2021-09-27T22:44:55Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Stochastic Finite State Control of POMDPs with LTL Specifications [14.163899014007647]
Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty.
This paper considers the quantitative problem of synthesizing sub-optimal finite state controllers (sFSCs) for POMDPs.
We propose a bounded policy algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations.
arXiv Detail & Related papers (2020-01-21T18:10:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.