Strengthening Deterministic Policies for POMDPs
- URL: http://arxiv.org/abs/2007.08351v1
- Date: Thu, 16 Jul 2020 14:22:55 GMT
- Title: Strengthening Deterministic Policies for POMDPs
- Authors: Leonore Winterer, Ralf Wimmer, Nils Jansen, Bernd Becker
- Abstract summary: We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints.
We employ a preprocessing of the POMDP to encompass memory-based decisions.
The advantages of our approach lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications.
- Score: 5.092711491848192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The synthesis problem for partially observable Markov decision processes
(POMDPs) is to compute a policy that satisfies a given specification. Such
policies have to take the full execution history of a POMDP into account,
rendering the problem undecidable in general. A common approach is to use a
limited amount of memory and randomize over potential choices. Yet, this
problem is still NP-hard and often computationally intractable in practice. A
restricted problem is to use neither history nor randomization, yielding
policies that are called stationary and deterministic. Previous approaches to
compute such policies employ mixed-integer linear programming (MILP). We
provide a novel MILP encoding that supports sophisticated specifications in the
form of temporal logic constraints. It is able to handle an arbitrary number of
such specifications. Yet, randomization and memory are often mandatory to
achieve satisfactory policies. First, we extend our encoding to deliver a
restricted class of randomized policies. Second, based on the results of the
original MILP, we employ a preprocessing of the POMDP to encompass memory-based
decisions. The advantages of our approach over state-of-the-art POMDP solvers
lie (1) in the flexibility to strengthen simple deterministic policies without
losing computational tractability and (2) in the ability to enforce the
provable satisfaction of arbitrarily many specifications. The latter point
allows taking trade-offs between performance and safety aspects of typical
POMDP examples into account. We show the effectiveness of our method on a broad
range of benchmarks.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs)
POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached.
In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena.
We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Randomized Policy Optimization for Optimal Stopping [0.0]
We propose a new methodology for optimal stopping based on randomized linear policies.
We show that our approach can substantially outperform state-of-the-art methods.
arXiv Detail & Related papers (2022-03-25T04:33:15Z) - LTL-Constrained Steady-State Policy Synthesis [0.0]
We study Markov decision processes (MDP) with the specification combining all these types.
We provide a unified solution reducing the multi-type specification to a multi-dimensional long-run average reward.
The algorithm also extends to the general $omega$-regular properties and runs in time in the sizes of the MDP as well as the LDBA.
arXiv Detail & Related papers (2021-05-31T11:35:42Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Point-Based Methods for Model Checking in Partially Observable Markov
Decision Processes [36.07746952116073]
We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP)
We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula.
We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
arXiv Detail & Related papers (2020-01-11T23:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.