LTL-Constrained Steady-State Policy Synthesis
- URL: http://arxiv.org/abs/2105.14894v1
- Date: Mon, 31 May 2021 11:35:42 GMT
- Title: LTL-Constrained Steady-State Policy Synthesis
- Authors: Jan K\v{r}et\'insk\'y
- Abstract summary: We study Markov decision processes (MDP) with the specification combining all these types.
We provide a unified solution reducing the multi-type specification to a multi-dimensional long-run average reward.
The algorithm also extends to the general $omega$-regular properties and runs in time in the sizes of the MDP as well as the LDBA.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Decision-making policies for agents are often synthesized with the constraint
that a formal specification of behaviour is satisfied. Here we focus on
infinite-horizon properties. On the one hand, Linear Temporal Logic (LTL) is a
popular example of a formalism for qualitative specifications. On the other
hand, Steady-State Policy Synthesis (SSPS) has recently received considerable
attention as it provides a more quantitative and more behavioural perspective
on specifications, in terms of the frequency with which states are visited.
Finally, rewards provide a classic framework for quantitative properties. In
this paper, we study Markov decision processes (MDP) with the specification
combining all these three types. The derived policy maximizes the reward among
all policies ensuring the LTL specification with the given probability and
adhering to the steady-state constraints. To this end, we provide a unified
solution reducing the multi-type specification to a multi-dimensional long-run
average reward. This is enabled by Limit-Deterministic B\"uchi Automata (LDBA),
recently studied in the context of LTL model checking on MDP, and allows for an
elegant solution through a simple linear programme. The algorithm also extends
to the general $\omega$-regular properties and runs in time polynomial in the
sizes of the MDP as well as the LDBA.
Related papers
- DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Towards Instance-Optimality in Online PAC Reinforcement Learning [28.156332484814616]
We propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy.
We demonstrate that the sample complexity of the PEDEL algorithm of citeWagenmaker22linearMDP closely approaches this lower bound.
arXiv Detail & Related papers (2023-10-31T19:26:36Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Verifiable Planning in Expected Reward Multichain MDPs [20.456052208569115]
We explore the steady-state planning problem of deriving a decision-making policy for an agent.
We prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
arXiv Detail & Related papers (2020-12-03T18:54:24Z) - LTLf Synthesis on Probabilistic Systems [0.0]
synthesis is used to find a policy that maximizes the probability of achieving this behavior.
No tools exist to solve policy synthesis for behaviors given finite-trace properties.
We present two algorithms for solving this problem: the first via reduction of Markov Processes and the second native tools for automatonf.
arXiv Detail & Related papers (2020-09-23T01:26:47Z) - Strengthening Deterministic Policies for POMDPs [5.092711491848192]
We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints.
We employ a preprocessing of the POMDP to encompass memory-based decisions.
The advantages of our approach lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications.
arXiv Detail & Related papers (2020-07-16T14:22:55Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.