Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach
- URL: http://arxiv.org/abs/2012.12732v1
- Date: Wed, 23 Dec 2020 15:09:28 GMT
- Title: Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach
- Authors: Giulio Mazzi, Alberto Castellini, Alessandro Farinelli
- Abstract summary: We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
- Score: 78.05638156687343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partially Observable Monte-Carlo Planning (POMCP) is a powerful online
algorithm able to generate approximate policies for large Partially Observable
Markov Decision Processes. The online nature of this method supports
scalability by avoiding complete policy representation. The lack of an explicit
representation however hinders interpretability. In this work, we propose a
methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP
policies by inspecting their traces, namely sequences of
belief-action-observation triplets generated by the algorithm. The proposed
method explores local properties of policy behavior to identify unexpected
decisions. We propose an iterative process of trace analysis consisting of
three main steps, i) the definition of a question by means of a parametric
logical formula describing (probabilistic) relationships between beliefs and
actions, ii) the generation of an answer by computing the parameters of the
logical formula that maximize the number of satisfied clauses (solving a
MAX-SMT problem), iii) the analysis of the generated logical formula and the
related decision boundaries for identifying unexpected decisions made by POMCP
with respect to the original question. We evaluate our approach on Tiger, a
standard benchmark for POMDPs, and a real-world problem related to mobile robot
navigation. Results show that the approach can exploit human knowledge on the
domain, outperforming state-of-the-art anomaly detection methods in identifying
unexpected decisions. An improvement of the Area Under Curve up to 47\% has
been achieved in our tests.
Related papers
- Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem.
Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP.
We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Point-Based Methods for Model Checking in Partially Observable Markov
Decision Processes [36.07746952116073]
We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP)
We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula.
We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
arXiv Detail & Related papers (2020-01-11T23:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.