Optimality Guarantees for Particle Belief Approximation of POMDPs
- URL: http://arxiv.org/abs/2210.05015v5
- Date: Thu, 19 Oct 2023 14:10:06 GMT
- Title: Optimality Guarantees for Particle Belief Approximation of POMDPs
- Authors: Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer, Claire J.
Tomlin, Zachary N. Sunberg
- Abstract summary: Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
- Score: 55.83001584645448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partially observable Markov decision processes (POMDPs) provide a flexible
representation for real-world decision and control problems. However, POMDPs
are notoriously difficult to solve, especially when the state and observation
spaces are continuous or hybrid, which is often the case for physical systems.
While recent online sampling-based POMDP algorithms that plan with observation
likelihood weighting have shown practical effectiveness, a general theory
characterizing the approximation error of the particle filtering techniques
that these algorithms use has not previously been proposed. Our main
contribution is bounding the error between any POMDP and its corresponding
finite sample particle belief MDP (PB-MDP) approximation. This fundamental
bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP
algorithm to a POMDP by solving the corresponding particle belief MDP, thereby
extending the convergence guarantees of the MDP algorithm to the POMDP.
Practically, this is implemented by using the particle filter belief transition
model as the generative model for the MDP solver. While this requires access to
the observation density model from the POMDP, it only increases the transition
sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where
$C$ is the number of particles. Thus, when combined with sparse sampling MDP
algorithms, this approach can yield algorithms for POMDPs that have no direct
theoretical dependence on the size of the state and observation spaces. In
addition to our theoretical contribution, we perform five numerical experiments
on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using
PB-MDP approximation, Sparse-PFT, achieves performance competitive with other
leading continuous observation POMDP solvers.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Stochastic Gradient Piecewise Deterministic Monte Carlo Samplers [3.487370856323828]
Recent work has suggested using Monte Carlo methods based on piecewise deterministic Markov processes (PDMPs) to sample from target distributions of interest.
We propose approximate simulation of PDMPs with sub-sampling for scalable sampling from posterior distributions.
We show these methods are easy to implement, present results on their approximation error and demonstrate numerically that this class of algorithms has similar efficiency to gradient Langevin dynamics.
arXiv Detail & Related papers (2024-06-27T09:59:28Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Towards Instance-Optimality in Online PAC Reinforcement Learning [28.156332484814616]
We propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy.
We demonstrate that the sample complexity of the PEDEL algorithm of citeWagenmaker22linearMDP closely approaches this lower bound.
arXiv Detail & Related papers (2023-10-31T19:26:36Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena.
We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z) - Safe Exploration by Solving Early Terminated MDP [77.10563395197045]
We introduce a new approach to address safe RL problems under the framework of Early TerminatedP (ET-MDP)
We first define the ET-MDP as an unconstrained algorithm with the same optimal value function as its corresponding CMDP.
An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better performance and improved learning efficiency.
arXiv Detail & Related papers (2021-07-09T04:24:40Z) - A Relation Analysis of Markov Decision Process Frameworks [26.308541799686505]
We study the relation between different Decision Process (MDP) frameworks in the machine learning and econometrics literature.
We show that the entropy-regularized MDP is equivalent to a MDP model, and is strictly subsumed by the general regularized MDP.
arXiv Detail & Related papers (2020-08-18T09:27:26Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.