Monte Carlo Planning in Hybrid Belief POMDPs
- URL: http://arxiv.org/abs/2211.07735v2
- Date: Tue, 2 May 2023 18:09:44 GMT
- Title: Monte Carlo Planning in Hybrid Belief POMDPs
- Authors: Moran Barenboim, Moshe Shienman and Vadim Indelman
- Abstract summary: We present Hybrid Belief Monte Carlo Planning (HB-MCP) that utilizes the Monte Carlo Tree Search (MCTS) algorithm to solve a POMDP.
We show how the upper confidence bound (UCB) exploration bonus can be leveraged to guide the growth of hypotheses trees alongside the belief trees.
We then evaluate our approach in highly aliased simulated environments where unresolved data association leads to multi-modal belief hypotheses.
- Score: 7.928094304325113
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Real-world problems often require reasoning about hybrid beliefs, over both
discrete and continuous random variables. Yet, such a setting has hardly been
investigated in the context of planning. Moreover, existing online Partially
Observable Markov Decision Processes (POMDPs) solvers do not support hybrid
beliefs directly. In particular, these solvers do not address the added
computational burden due to an increasing number of hypotheses with the
planning horizon, which can grow exponentially. As part of this work, we
present a novel algorithm, Hybrid Belief Monte Carlo Planning (HB-MCP) that
utilizes the Monte Carlo Tree Search (MCTS) algorithm to solve a POMDP while
maintaining a hybrid belief. We illustrate how the upper confidence bound (UCB)
exploration bonus can be leveraged to guide the growth of hypotheses trees
alongside the belief trees. We then evaluate our approach in highly aliased
simulated environments where unresolved data association leads to multi-modal
belief hypotheses.
Related papers
- Value Gradients with Action Adaptive Search Trees in Continuous (PO)MDPs [7.170248667518935]
Solving POMDPs in continuous state, action and observation spaces is key for autonomous planning in real-world mobility and robotics applications.
We formulate a novel Multiple Importance Sampling tree for value estimation, that allows to share value information between sibling action branches.
Second, we propose a novel methodology to compute value gradients with online sampling based on transition likelihoods.
arXiv Detail & Related papers (2025-03-15T15:51:06Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - A Bayesian Approach to Online Planning [14.847090489758992]
Combination of Monte Carlo tree search and neural networks has revolutionized online planning.
We ask whether uncertainty estimates about the network outputs could be used to improve planning.
We develop a Bayesian planning approach that facilitates such uncertainty quantification, inspired by classical ideas from the meta-reasoning literature.
arXiv Detail & Related papers (2024-06-04T08:33:17Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based
Reasoning in Partially Observable Environments [21.548271801592907]
We propose an online Monte-Carlo Tree Search based planning method for type-based reasoning in large partially observable environments.
POTMMCP incorporates a novel meta-policy for guiding search and evaluating beliefs, allowing it to search more effectively to longer horizons.
We show that our method converges to the optimal solution in the limit and empirically demonstrate that it effectively adapts online to diverse sets of other agents.
arXiv Detail & Related papers (2023-06-09T17:43:49Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - Cooperative Trajectory Planning in Uncertain Environments with Monte
Carlo Tree Search and Risk Metrics [2.658812114255374]
We extend an existing cooperative trajectory planning approach based on Monte Carlo Tree Search for continuous action spaces.
It does so by explicitly modeling uncertainties in the form of a root belief state, from which start states for trees are sampled.
It can be demonstrated that the integration of risk metrics in the final selection policy consistently outperforms a baseline in uncertain environments.
arXiv Detail & Related papers (2022-03-09T00:14:41Z) - Adaptive Belief Discretization for POMDP Planning [7.508023795800546]
Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number.
We propose an adaptive belief discretization scheme, and give its associated planning error.
We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.
arXiv Detail & Related papers (2021-04-15T07:04:32Z) - Monte Carlo Information-Oriented Planning [6.0158981171030685]
We discuss how to solve information-gathering problems expressed as rho-POMDPs.
We build on the POMCP algorithm to propose a Monte Carlo Tree Search for rho-POMDPs.
arXiv Detail & Related papers (2021-03-21T09:09:27Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.