Rule-based Shielding for Partially Observable Monte-Carlo Planning
- URL: http://arxiv.org/abs/2104.13791v1
- Date: Wed, 28 Apr 2021 14:23:38 GMT
- Title: Rule-based Shielding for Partially Observable Monte-Carlo Planning
- Authors: Giulio Mazzi, Alberto Castellini, Alessandro Farinelli
- Abstract summary: We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
- Score: 78.05638156687343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partially Observable Monte-Carlo Planning (POMCP) is a powerful online
algorithm able to generate approximate policies for large Partially Observable
Markov Decision Processes. The online nature of this method supports
scalability by avoiding complete policy representation. The lack of an explicit
representation however hinders policy interpretability and makes policy
verification very complex. In this work, we propose two contributions. The
first is a method for identifying unexpected actions selected by POMCP with
respect to expert prior knowledge of the task. The second is a shielding
approach that prevents POMCP from selecting unexpected actions. The first
method is based on Satisfiability Modulo Theory (SMT). It inspects traces
(i.e., sequences of belief-action-observation triplets) generated by POMCP to
compute the parameters of logical formulas about policy properties defined by
the expert. The second contribution is a module that uses online the logical
formulas to identify anomalous actions selected by POMCP and substitutes those
actions with actions that satisfy the logical formulas fulfilling expert
knowledge. We evaluate our approach on Tiger, a standard benchmark for POMDPs,
and a real-world problem related to velocity regulation in mobile robot
navigation. Results show that the shielded POMCP outperforms the standard POMCP
in a case study in which a wrong parameter of POMCP makes it select wrong
actions from time to time. Moreover, we show that the approach keeps good
performance also if the parameters of the logical formula are optimized using
trajectories containing some wrong actions.
Related papers
- Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps.
Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation.
We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z) - Rollout Heuristics for Online Stochastic Contingent Planning [6.185979230964809]
Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform.
POMDP is highly dependent on the rollout policy to compute good estimates.
In this paper, we model POMDPs as contingent planning problems.
arXiv Detail & Related papers (2023-10-03T18:24:47Z) - Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs)
POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached.
In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Strengthening Deterministic Policies for POMDPs [5.092711491848192]
We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints.
We employ a preprocessing of the POMDP to encompass memory-based decisions.
The advantages of our approach lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications.
arXiv Detail & Related papers (2020-07-16T14:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.