Learning in POMDPs is Sample-Efficient with Hindsight Observability
- URL: http://arxiv.org/abs/2301.13857v1
- Date: Tue, 31 Jan 2023 18:54:36 GMT
- Title: Learning in POMDPs is Sample-Efficient with Hindsight Observability
- Authors: Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
- Abstract summary: POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.
In many realistic problems, more information is either revealed or can be computed during some point of the learning process.
We formulate a setting (setshort) as a POMDP where the latent states are revealed to the learner in hindsight and only during training.
- Score: 36.66596305441365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: POMDPs capture a broad class of decision making problems, but hardness
results suggest that learning is intractable even in simple settings due to the
inherent partial observability. However, in many realistic problems, more
information is either revealed or can be computed during some point of the
learning process. Motivated by diverse applications ranging from robotics to
data center scheduling, we formulate a \setting (\setshort) as a POMDP where
the latent states are revealed to the learner in hindsight and only during
training. We introduce new algorithms for the tabular and function
approximation settings that are provably sample-efficient with hindsight
observability, even in POMDPs that would otherwise be statistically
intractable. We give a lower bound showing that the tabular algorithm is
optimal in its dependence on latent state and observation cardinalities.
Related papers
- Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Sample-Efficient Learning of POMDPs with Multiple Observations In
Hindsight [105.6882315781987]
This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs)
Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called multiple observations in hindsight''
We show that sample-efficient learning is possible for two new subclasses of POMDPs: emphmulti-observation revealing POMDPs and emphdistinguishable POMDPs
arXiv Detail & Related papers (2023-07-06T09:39:01Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - When Is Partially Observable Reinforcement Learning Not Scary? [30.754810416907123]
We show that learning partially observable decision processes (POMDPs) requires an exponential number of samples in the worst case.
This is the first provably efficient result for learning from interactions in overcomplete POMDPs.
arXiv Detail & Related papers (2022-04-19T16:08:28Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.