Online POMDP Planning with Anytime Deterministic Guarantees
- URL: http://arxiv.org/abs/2310.01791v2
- Date: Wed, 4 Oct 2023 19:51:56 GMT
- Title: Online POMDP Planning with Anytime Deterministic Guarantees
- Authors: Moran Barenboim and Vadim Indelman
- Abstract summary: Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs)
Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks.
We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
- Score: 11.157761902108692
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous agents operating in real-world scenarios frequently encounter
uncertainty and make decisions based on incomplete information. Planning under
uncertainty can be mathematically formalized using partially observable Markov
decision processes (POMDPs). However, finding an optimal plan for POMDPs can be
computationally expensive and is feasible only for small tasks. In recent
years, approximate algorithms, such as tree search and sample-based
methodologies, have emerged as state-of-the-art POMDP solvers for larger
problems. Despite their effectiveness, these algorithms offer only
probabilistic and often asymptotic guarantees toward the optimal solution due
to their dependence on sampling. To address these limitations, we derive a
deterministic relationship between a simplified solution that is easier to
obtain and the theoretically optimal one. First, we derive bounds for selecting
a subset of the observations to branch from while computing a complete belief
at each posterior node. Then, since a complete belief update may be
computationally demanding, we extend the bounds to support reduction of both
the state and the observation spaces. We demonstrate how our guarantees can be
integrated with existing state-of-the-art solvers that sample a subset of
states and observations. As a result, the returned solution holds deterministic
bounds relative to the optimal policy. Lastly, we substantiate our findings
with supporting experimental results.
Related papers
- Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem.
Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP.
We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Interpretable Decision Tree Search as a Markov Decision Process [8.530182510074983]
Finding an optimal decision tree for a supervised learning task is a challenging problem to solve at scale.
It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling.
We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that generates for every state.
arXiv Detail & Related papers (2023-09-22T08:18:08Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Policy Gradient Algorithms for Robust MDPs with Non-Rectangular
Uncertainty Sets [10.26382228865201]
We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets.
The corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable.
We thus present the first complete solution scheme for robust MDPs with non-rectangular uncertainty sets offering global optimality guarantees.
arXiv Detail & Related papers (2023-05-30T13:02:25Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs [24.256960622176305]
We propose the first (nearly) matching upper and lower bounds on the sample complexity of PAC RL in episodic Markov decision processes.
Our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap.
Their design and analyses employ novel ideas, including graph-theoretical concepts such as minimum flows and maximum cuts.
arXiv Detail & Related papers (2022-03-17T11:19:41Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Tightly Robust Optimization via Empirical Domain Reduction [22.63829081634384]
We propose an algorithm to determine the scale such that the solution has a good objective value.
Under some regularity conditions, the scale obtained by our algorithm is $O(sqrtn)$, whereas the scale obtained by a standard approach is $O(sqrtd/n)$.
arXiv Detail & Related papers (2020-02-29T12:24:56Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z) - Sparse tree search optimality guarantees in POMDPs with continuous
observation spaces [39.17638795259191]
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems.
Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces.
This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution.
arXiv Detail & Related papers (2019-10-10T02:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.