Verification of indefinite-horizon POMDPs
- URL: http://arxiv.org/abs/2007.00102v1
- Date: Tue, 30 Jun 2020 21:01:52 GMT
- Title: Verification of indefinite-horizon POMDPs
- Authors: Alexander Bork, Sebastian Junges, Joost-Pieter Katoen, Tim Quatmann
- Abstract summary: This paper considers the verification problem for partially observable MDPs.
We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach.
- Score: 63.6726420864286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The verification problem in MDPs asks whether, for any policy resolving the
nondeterminism, the probability that something bad happens is bounded by some
given threshold. This verification problem is often overly pessimistic, as the
policies it considers may depend on the complete system state. This paper
considers the verification problem for partially observable MDPs, in which the
policies make their decisions based on (the history of) the observations
emitted by the system. We present an abstraction-refinement framework extending
previous instantiations of the Lovejoy-approach. Our experiments show that this
framework significantly improves the scalability of the approach.
Related papers
- A Finite-State Controller Based Offline Solver for Deterministic POMDPs [18.518047404768378]
We propose DetMCVI, an adaptation of the Monte Carlo Value Iteration (MCVI) algorithm for DetPOMDPs.
DetMCVI solves large problems with a high success rate, outperforming existing baselines for DetPOMDPs.
We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario.
arXiv Detail & Related papers (2025-05-01T15:30:26Z) - Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs [11.829110453985228]
We investigate off-policy evaluation (OPE) in Partially Observable Decision Processes (POMDPs) with large observation spaces.
We prove information-theoretic hardness for model-free OPE of history-dependent policies in several settings.
We show that some hardness can be circumvented by a natural model-based Markov algorithm.
arXiv Detail & Related papers (2025-03-03T03:29:05Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Towards Using Fully Observable Policies for POMDPs [0.0]
Partially Observable Markov Decision Process (POMDP) is a framework applicable to many real world problems.
We propose an approach to solve POMDPs with multimodal belief by relying on a policy that solves the fully observable version.
arXiv Detail & Related papers (2022-07-24T13:22:13Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Off-Policy Evaluation in Partially Observed Markov Decision Processes
under Sequential Ignorability [8.388782503421504]
We consider off-policy evaluation of dynamic treatment rules under sequential ignorability.
We show that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes.
arXiv Detail & Related papers (2021-10-24T03:35:23Z) - Smoother Entropy for Active State Trajectory Estimation and Obfuscation
in POMDPs [3.42658286826597]
optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches.
We identify belief-state MDP reformulations of both active estimation and obfuscation with concave cost and cost-to-go functions.
arXiv Detail & Related papers (2021-08-19T00:05:55Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Enforcing Almost-Sure Reachability in POMDPs [10.883864654718103]
Partially-Observable Markov Decision Processes (POMDPs) are a well-known model for sequential decision making under limited information.
We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state.
We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative.
arXiv Detail & Related papers (2020-06-30T19:59:46Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.