When Is Partially Observable Reinforcement Learning Not Scary?
- URL: http://arxiv.org/abs/2204.08967v1
- Date: Tue, 19 Apr 2022 16:08:28 GMT
- Title: When Is Partially Observable Reinforcement Learning Not Scary?
- Authors: Qinghua Liu, Alan Chung, Csaba Szepesv\'ari, Chi Jin
- Abstract summary: We show that learning partially observable decision processes (POMDPs) requires an exponential number of samples in the worst case.
This is the first provably efficient result for learning from interactions in overcomplete POMDPs.
- Score: 30.754810416907123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applications of Reinforcement Learning (RL), in which agents learn to make a
sequence of decisions despite lacking complete information about the latent
states of the controlled system, that is, they act under partial observability
of the states, are ubiquitous. Partially observable RL can be notoriously
difficult -- well-known information-theoretic results show that learning
partially observable Markov decision processes (POMDPs) requires an exponential
number of samples in the worst case. Yet, this does not rule out the existence
of large subclasses of POMDPs over which learning is tractable.
In this paper we identify such a subclass, which we call weakly revealing
POMDPs. This family rules out the pathological instances of POMDPs where
observations are uninformative to a degree that makes learning hard. We prove
that for weakly revealing POMDPs, a simple algorithm combining optimism and
Maximum Likelihood Estimation (MLE) is sufficient to guarantee polynomial
sample complexity. To the best of our knowledge, this is the first provably
sample-efficient result for learning from interactions in overcomplete POMDPs,
where the number of latent states can be larger than the number of
observations.
Related papers
- Sample-Efficient Learning of POMDPs with Multiple Observations In
Hindsight [105.6882315781987]
This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs)
Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called multiple observations in hindsight''
We show that sample-efficient learning is possible for two new subclasses of POMDPs: emphmulti-observation revealing POMDPs and emphdistinguishable POMDPs
arXiv Detail & Related papers (2023-07-06T09:39:01Z) - Learning in POMDPs is Sample-Efficient with Hindsight Observability [36.66596305441365]
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.
In many realistic problems, more information is either revealed or can be computed during some point of the learning process.
We formulate a setting (setshort) as a POMDP where the latent states are revealed to the learner in hindsight and only during training.
arXiv Detail & Related papers (2023-01-31T18:54:36Z) - Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z) - Partially Observable RL with B-Stability: Unified Structural Condition
and Sharp Sample-Efficient Algorithms [25.658930892561735]
This paper advances all three aspects of Partially Observable RL in the general setting of Predictive State Representations (PSRs)
We propose a natural and unified structural condition for PSRs called emphB-stability. B-stable PSRs encompasses the vast majority of known tractable subclasses.
We show that any B-stable PSR can be learned with samples in relevant problem parameters. When instantiated in the aforementioned subclasses, our sample complexities improve.
arXiv Detail & Related papers (2022-09-29T17:51:51Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit
Partial Observability [92.95794652625496]
Generalization is a central challenge for the deployment of reinforcement learning systems.
We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability.
We recast the problem of generalization in RL as solving the induced partially observed Markov decision process.
arXiv Detail & Related papers (2021-07-13T17:59:25Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.