Related papers: Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

URL: http://arxiv.org/abs/2001.03809v1
Date: Sat, 11 Jan 2020 23:09:25 GMT
Title: Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes
Authors: Maxime Bouton, Jana Tumova, and Mykel J. Kochenderfer
Abstract summary: We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP) We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
Score: 36.07746952116073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.

Related papers

Learning Policy Representations for Steerable Behavior Synthesis [80.4542176039074]
Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time.<n>We show that these representations can be approximated uniformly for a range of policies using a set-based architecture.<n>We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions.
arXiv Detail & Related papers (2026-01-29T21:52:06Z)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP) We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z)
Model-Free Reinforcement Learning for Optimal Control of MarkovDecision Processes Under Signal Temporal Logic Specifications [7.842869080999489]
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process. We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.
arXiv Detail & Related papers (2021-09-27T22:44:55Z)
Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP) The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces. The proposed method explores local properties of policy behavior to identify unexpected decisions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
Structural Estimation of Partially Observable Markov Decision Processes [3.1614382994158956]
We consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process. We illustrate the estimation methodology with an application to optimal equipment replacement.
arXiv Detail & Related papers (2020-08-02T15:04:27Z)
Strengthening Deterministic Policies for POMDPs [5.092711491848192]
We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. We employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications.
arXiv Detail & Related papers (2020-07-16T14:22:55Z)
Enforcing Almost-Sure Reachability in POMDPs [10.883864654718103]
Partially-Observable Markov Decision Processes (POMDPs) are a well-known model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative.
arXiv Detail & Related papers (2020-06-30T19:59:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.