Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
- URL: http://arxiv.org/abs/2006.12484v2
- Date: Sun, 25 Oct 2020 03:22:02 GMT
- Title: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
- Authors: Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu
- Abstract summary: This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
- Score: 91.40308354344505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partial observability is a common challenge in many reinforcement learning
applications, which requires an agent to maintain memory, infer latent states,
and integrate this past information into exploration. This challenge leads to a
number of computational and statistical hardness results for learning general
Partially Observable Markov Decision Processes (POMDPs). This work shows that
these hardness barriers do not preclude efficient reinforcement learning for
rich and interesting subclasses of POMDPs. In particular, we present a
sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs,
where the number of observations is larger than the number of latent states and
where exploration is essential for learning, thus distinguishing our results
from prior works. OOM-UCB achieves an optimal sample complexity of
$\tilde{\mathcal{O}}(1/\varepsilon^2)$ for finding an $\varepsilon$-optimal
policy, along with being polynomial in all other relevant quantities. As an
interesting special case, we also provide a computationally and statistically
efficient algorithm for POMDPs with deterministic state transitions.
Related papers
- Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency [105.17746223041954]
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges.
It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon.
We propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.
arXiv Detail & Related papers (2022-05-26T16:34:46Z) - When Is Partially Observable Reinforcement Learning Not Scary? [30.754810416907123]
We show that learning partially observable decision processes (POMDPs) requires an exponential number of samples in the worst case.
This is the first provably efficient result for learning from interactions in overcomplete POMDPs.
arXiv Detail & Related papers (2022-04-19T16:08:28Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.