Related papers: Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

URL: http://arxiv.org/abs/1910.04332v4
Date: Mon, 5 Jun 2023 05:40:41 GMT
Title: Sparse tree search optimality guarantees in POMDPs with continuous observation spaces
Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg
Abstract summary: Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution.
Score: 39.17638795259191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

Related papers

Observation Adaptation via Annealed Importance Resampling for Partially Observable Markov Decision Processes [4.830416359005018]
Partially observable Markov decision processes (POMDPs) are a general mathematical model for sequential decision-making in environments under state uncertainty. Online solvers typically use bootstrap particle filters based on importance resampling for updating the belief distribution. We propose an approach that constructs a sequence of bridge distributions between the state-transition and optimal distributions through iterative Monte Carlo steps.
arXiv Detail & Related papers (2025-03-25T03:05:00Z)
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem. Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z)
Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs) Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games. Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z)
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z)
Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities. We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability. The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z)
Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs) Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z)
On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data. It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known. We find optimal policies for the approximate belief model under mild non-linear filter stability conditions. We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.