Rollout Heuristics for Online Stochastic Contingent Planning
- URL: http://arxiv.org/abs/2310.02345v1
- Date: Tue, 3 Oct 2023 18:24:47 GMT
- Title: Rollout Heuristics for Online Stochastic Contingent Planning
- Authors: Oded Blumenthal, Guy Shani
- Abstract summary: Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform.
POMDP is highly dependent on the rollout policy to compute good estimates.
In this paper, we model POMDPs as contingent planning problems.
- Score: 6.185979230964809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partially observable Markov decision processes (POMDP) are a useful model for
decision-making under partial observability and stochastic actions. Partially
Observable Monte-Carlo Planning is an online algorithm for deciding on the next
action to perform, using a Monte-Carlo tree search approach, based on the UCT
(UCB applied to trees) algorithm for fully observable Markov-decision
processes. POMCP develops an action-observation tree, and at the leaves, uses a
rollout policy to provide a value estimate for the leaf. As such, POMCP is
highly dependent on the rollout policy to compute good estimates, and hence
identify good actions. Thus, many practitioners who use POMCP are required to
create strong, domain-specific heuristics.
In this paper, we model POMDPs as stochastic contingent planning problems.
This allows us to leverage domain-independent heuristics that were developed in
the planning community. We suggest two heuristics, the first is based on the
well-known h_add heuristic from classical planning, and the second is computed
in belief space, taking the value of information into account.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes [1.445706856497821]
This work defines an MDP framework, the textttSD-MDP, where we disentangle the causal structure of MDPs' transition and reward dynamics.
We derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-23T16:22:40Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs)
POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached.
In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z) - Planning in Observable POMDPs in Quasipolynomial Time [21.03037504572896]
We develop a quasipolynomial-time algorithm for planning in observable POMDPs.
We assume that well-separated distributions on states lead to well-separated distributions on observations.
We prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.
arXiv Detail & Related papers (2022-01-12T23:16:37Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.