Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making
- URL: http://arxiv.org/abs/2209.14997v1
- Date: Thu, 29 Sep 2022 17:56:25 GMT
- Title: Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making
- Authors: Qinghua Liu, Praneeth Netrapalli, Csaba Szepesvari, Chi Jin
- Abstract summary: This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
- Score: 48.87943416098096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a simple efficient learning algorithms for general
sequential decision making. The algorithm combines Optimism for exploration
with Maximum Likelihood Estimation for model estimation, which is thus named
OMLE. We prove that OMLE learns the near-optimal policies of an enormously rich
class of sequential decision making problems in a polynomial number of samples.
This rich class includes not only a majority of known tractable model-based
Reinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, low
witness rank problems, tabular weakly-revealing/observable POMDPs and
multi-step decodable POMDPs), but also many new challenging RL problems
especially in the partially observable setting that were not previously known
to be tractable.
Notably, the new problems addressed by this paper include (1) observable
POMDPs with continuous observation and function approximation, where we achieve
the first sample complexity that is completely independent of the size of
observation space; (2) well-conditioned low-rank sequential decision making
problems (also known as Predictive State Representations (PSRs)), which include
and generalize all known tractable POMDP examples under a more intrinsic
representation; (3) general sequential decision making problems under SAIL
condition, which unifies our existing understandings of model-based RL in both
fully observable and partially observable settings. SAIL condition is
identified by this paper, which can be viewed as a natural generalization of
Bellman/witness rank to address partial observability.
Related papers
- GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Partially Observable RL with B-Stability: Unified Structural Condition
and Sharp Sample-Efficient Algorithms [25.658930892561735]
This paper advances all three aspects of Partially Observable RL in the general setting of Predictive State Representations (PSRs)
We propose a natural and unified structural condition for PSRs called emphB-stability. B-stable PSRs encompasses the vast majority of known tractable subclasses.
We show that any B-stable PSR can be learned with samples in relevant problem parameters. When instantiated in the aforementioned subclasses, our sample complexities improve.
arXiv Detail & Related papers (2022-09-29T17:51:51Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - When Is Partially Observable Reinforcement Learning Not Scary? [30.754810416907123]
We show that learning partially observable decision processes (POMDPs) requires an exponential number of samples in the worst case.
This is the first provably efficient result for learning from interactions in overcomplete POMDPs.
arXiv Detail & Related papers (2022-04-19T16:08:28Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit
Partial Observability [92.95794652625496]
Generalization is a central challenge for the deployment of reinforcement learning systems.
We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability.
We recast the problem of generalization in RL as solving the induced partially observed Markov decision process.
arXiv Detail & Related papers (2021-07-13T17:59:25Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.