PAC Reinforcement Learning for Predictive State Representations
- URL: http://arxiv.org/abs/2207.05738v1
- Date: Tue, 12 Jul 2022 17:57:17 GMT
- Title: PAC Reinforcement Learning for Predictive State Representations
- Authors: Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
- Abstract summary: We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
- Score: 60.00237613646686
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In this paper we study online Reinforcement Learning (RL) in partially
observable dynamical systems. We focus on the Predictive State Representations
(PSRs) model, which is an expressive model that captures other well-known
models such as Partially Observable Markov Decision Processes (POMDP). PSR
represents the states using a set of predictions of future observations and is
defined entirely using observable quantities. We develop a novel model-based
algorithm for PSRs that can learn a near optimal policy in sample complexity
scaling polynomially with respect to all the relevant parameters of the
systems. Our algorithm naturally works with function approximation to extend to
systems with potentially large state and observation spaces. We show that given
a realizable model class, the sample complexity of learning the near optimal
policy only scales polynomially with respect to the statistical complexity of
the model class, without any explicit polynomial dependence on the size of the
state and observation spaces. Notably, our work is the first work that shows
polynomial sample complexities to compete with the globally optimal policy in
PSRs. Finally, we demonstrate how our general theorem can be directly used to
derive sample complexity bounds for special models including $m$-step weakly
revealing and $m$-step decodable tabular POMDPs, POMDPs with low-rank latent
transition, and POMDPs with linear emission and latent transition.
Related papers
- Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Partially Observable RL with B-Stability: Unified Structural Condition
and Sharp Sample-Efficient Algorithms [25.658930892561735]
This paper advances all three aspects of Partially Observable RL in the general setting of Predictive State Representations (PSRs)
We propose a natural and unified structural condition for PSRs called emphB-stability. B-stable PSRs encompasses the vast majority of known tractable subclasses.
We show that any B-stable PSR can be learned with samples in relevant problem parameters. When instantiated in the aforementioned subclasses, our sample complexities improve.
arXiv Detail & Related papers (2022-09-29T17:51:51Z) - Provably Efficient Reinforcement Learning in Partially Observable
Dynamical Systems [97.12538243736705]
We study Reinforcement Learning for partially observable dynamical systems using function approximation.
We propose a new textitPartially Observable Bilinear Actor-Critic framework, that is general enough to include models such as POMDPs, observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.
arXiv Detail & Related papers (2022-06-24T00:27:42Z) - Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling
under Partial Coverage [33.766012922307084]
We study model-based offline Reinforcement Learning with general function approximation.
We present an algorithm named Constrained Policy Optimization (CPPO) which leverages a general function class and uses a constraint to encode pessimism.
arXiv Detail & Related papers (2021-07-13T16:30:01Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.