Related papers: Planning in Observable POMDPs in Quasipolynomial Time

Planning in Observable POMDPs in Quasipolynomial Time

URL: http://arxiv.org/abs/2201.04735v1
Date: Wed, 12 Jan 2022 23:16:37 GMT
Title: Planning in Observable POMDPs in Quasipolynomial Time
Authors: Noah Golowich, Ankur Moitra, and Dhruv Rohatgi
Abstract summary: We develop a quasipolynomial-time algorithm for planning in observable POMDPs. We assume that well-separated distributions on states lead to well-separated distributions on observations. We prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.
Score: 21.03037504572896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning oracle that computes an optimal policy when the parameters are known, even though the problem is known to be computationally hard. Almost all existing planning algorithms either run in exponential time, lack provable performance guarantees, or require placing strong assumptions on the transition dynamics under every possible policy. In this work, we revisit the planning problem and ask: are there natural and well-motivated assumptions that make planning easy? Our main result is a quasipolynomial-time algorithm for planning in (one-step) observable POMDPs. Specifically, we assume that well-separated distributions on states lead to well-separated distributions on observations, and thus the observations are at least somewhat informative in each step. Crucially, this assumption places no restrictions on the transition dynamics of the POMDP; nevertheless, it implies that near-optimal policies admit quasi-succinct descriptions, which is not true in general (under standard hardness assumptions). Our analysis is based on new quantitative bounds for filter stability -- i.e. the rate at which an optimal filter for the latent state forgets its initialization. Furthermore, we prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.

Related papers

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs) In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z)
Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver. We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications. We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z)
Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs) Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
Learning in Observable POMDPs, without Computationally Intractable Oracles [23.636033995089587]
We develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions. Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in "observable" POMDPs, where observability is the assumption that well-separated distributions over states induce well-separated distributions over observations.
arXiv Detail & Related papers (2022-06-07T17:05:27Z)
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes. We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm. textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z)
Efficient Belief Space Planning in High-Dimensional State Spaces using PIVOT: Predictive Incremental Variable Ordering Tactic [11.878820609988693]
We examine the problem of online decision making under uncertainty, which we formulate as planning in the belief space. We call this approach PIVOT: Predictive Incremental Variable Ordering Tactic. Applying this tactic can also improve the state inference efficiency.
arXiv Detail & Related papers (2021-12-29T07:30:47Z)
Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP) The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z)
Adaptive Belief Discretization for POMDP Planning [7.508023795800546]
Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number. We propose an adaptive belief discretization scheme, and give its associated planning error. We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.
arXiv Detail & Related papers (2021-04-15T07:04:32Z)
Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions. We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z)
Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known. We find optimal policies for the approximate belief model under mild non-linear filter stability conditions. We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.