Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings
- URL: http://arxiv.org/abs/2206.12081v1
- Date: Fri, 24 Jun 2022 05:13:35 GMT
- Title: Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings
- Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
- Abstract summary: We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs)
Our algorithm provably scales to large-scale POMDPs.
- Score: 97.12538243736705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study reinforcement learning with function approximation for large-scale
Partially Observable Markov Decision Processes (POMDPs) where the state space
and observation space are large or even continuous. Particularly, we consider
Hilbert space embeddings of POMDP where the feature of latent states and the
feature of observations admit a conditional Hilbert space embedding of the
observation emission process, and the latent state transition is deterministic.
Under the function approximation setup where the optimal latent state-action
$Q$-function is linear in the state feature, and the optimal $Q$-function has a
gap in actions, we provide a \emph{computationally and statistically efficient}
algorithm for finding the \emph{exact optimal} policy. We show our algorithm's
computational and statistical complexities scale polynomially with respect to
the horizon and the intrinsic dimension of the feature on the observation
space. Furthermore, we show both the deterministic latent transitions and gap
assumptions are necessary to avoid statistical complexity exponential in
horizon or dimension. Since our guarantee does not have an explicit dependence
on the size of the state and observation spaces, our algorithm provably scales
to large-scale POMDPs.
Related papers
- Weighted mesh algorithms for general Markov decision processes: Convergence and tractability [0.9940462449990576]
We introduce a mesh-type approach for tackling discrete-time, finite-horizon Markov Decision Processes (MDPs)
For unbounded state space the algorithm is "semi-tractable" in the sense that the complexity is $epsilonc$ with some dimension independent $cgeq2$.
arXiv Detail & Related papers (2024-06-29T10:08:23Z) - Measurement Simplification in ρ-POMDP with Performance Guarantees [6.129902017281406]
Decision making under uncertainty is at the heart of any autonomous system acting with imperfect information.
This paper introduces a novel approach to efficient decision-making, by partitioning the high-dimensional observation space.
We show that the bounds are adaptive, computationally efficient, and that they converge to the original solution.
arXiv Detail & Related papers (2023-09-19T15:40:42Z) - Optimal Scaling for Locally Balanced Proposals in Discrete Spaces [65.14092237705476]
We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution.
Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
arXiv Detail & Related papers (2022-09-16T22:09:53Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Provably Efficient Reinforcement Learning in Partially Observable
Dynamical Systems [97.12538243736705]
We study Reinforcement Learning for partially observable dynamical systems using function approximation.
We propose a new textitPartially Observable Bilinear Actor-Critic framework, that is general enough to include models such as POMDPs, observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.
arXiv Detail & Related papers (2022-06-24T00:27:42Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z) - Sparse tree search optimality guarantees in POMDPs with continuous
observation spaces [39.17638795259191]
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems.
Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces.
This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution.
arXiv Detail & Related papers (2019-10-10T02:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.