Adaptive Belief Discretization for POMDP Planning
- URL: http://arxiv.org/abs/2104.07276v1
- Date: Thu, 15 Apr 2021 07:04:32 GMT
- Title: Adaptive Belief Discretization for POMDP Planning
- Authors: Divya Grover, Christos Dimitrakakis
- Abstract summary: Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number.
We propose an adaptive belief discretization scheme, and give its associated planning error.
We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.
- Score: 7.508023795800546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partially Observable Markov Decision Processes (POMDP) is a widely used model
to represent the interaction of an environment and an agent, under state
uncertainty. Since the agent does not observe the environment state, its
uncertainty is typically represented through a probabilistic belief. While the
set of possible beliefs is infinite, making exact planning intractable, the
belief space's complexity (and hence planning complexity) is characterized by
its covering number. Many POMDP solvers uniformly discretize the belief space
and give the planning error in terms of the (typically unknown) covering
number. We instead propose an adaptive belief discretization scheme, and give
its associated planning error. We furthermore characterize the covering number
with respect to the POMDP parameters. This allows us to specify the exact
memory requirements on the planner, needed to bound the value function error.
We then propose a novel, computationally efficient solver using this scheme. We
demonstrate that our algorithm is highly competitive with the state of the art
in a variety of scenarios.
Related papers
- Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Prospective Side Information for Latent MDPs [80.00842638151558]
We study the class of LMDPs with em prospective side information, when an agent receives additional, weakly revealing, information at the beginning of each episode.
We show that, surprisingly, this problem is not captured by contemporary settings and algorithms designed for partially observed environments.
We then establish that any sample efficient algorithm must suffer at least $Omega(K2/3)$-regret, as opposed to standard $Omega(sqrtK)$ lower bounds, and design an algorithm with a matching upper bound.
arXiv Detail & Related papers (2023-10-11T15:37:31Z) - Robots That Ask For Help: Uncertainty Alignment for Large Language Model
Planners [85.03486419424647]
KnowNo is a framework for measuring and aligning the uncertainty of large language models.
KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
arXiv Detail & Related papers (2023-07-04T21:25:12Z) - Simplified Continuous High Dimensional Belief Space Planning with
Adaptive Probabilistic Belief-dependent Constraints [9.061408029414453]
Under uncertainty in partially observable domains, also known as Belief Space Planning, online decision making is a fundamental problem.
We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint.
We apply our method to active SLAM, a highly challenging problem of high dimensional Belief Space Planning.
arXiv Detail & Related papers (2023-02-13T21:22:47Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Adaptive Information Belief Space Planning [9.365993173260316]
We focus on making informed decisions efficiently, using reward functions that explicitly deal with uncertainty.
We derive bounds on the expected information-theoretic reward function and, as a consequence, on the value function.
We then propose a method to refine aggregation to achieve identical action selection with a fraction of the computational time.
arXiv Detail & Related papers (2022-01-14T21:12:00Z) - Planning in Observable POMDPs in Quasipolynomial Time [21.03037504572896]
We develop a quasipolynomial-time algorithm for planning in observable POMDPs.
We assume that well-separated distributions on states lead to well-separated distributions on observations.
We prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.
arXiv Detail & Related papers (2022-01-12T23:16:37Z) - Efficient Belief Space Planning in High-Dimensional State Spaces using
PIVOT: Predictive Incremental Variable Ordering Tactic [11.878820609988693]
We examine the problem of online decision making under uncertainty, which we formulate as planning in the belief space.
We call this approach PIVOT: Predictive Incremental Variable Ordering Tactic.
Applying this tactic can also improve the state inference efficiency.
arXiv Detail & Related papers (2021-12-29T07:30:47Z) - Minimax Regret Optimisation for Robust Planning in Uncertain Markov
Decision Processes [3.5289688061934963]
Minimax regret has been proposed as an objective for planning in Uncertain MDPs to find robust policies.
We introduce a Bellman equation to compute the regret for a policy.
We show that it optimises minimax regret exactly for UMDPs with independent uncertainties.
arXiv Detail & Related papers (2020-12-08T18:48:14Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.