Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP
- URL: http://arxiv.org/abs/2306.12356v1
- Date: Wed, 21 Jun 2023 16:04:03 GMT
- Title: Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP
- Authors: Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang,
Xuezhou Zhang
- Abstract summary: We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
- Score: 81.00800920928621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study representation learning in partially observable
Markov Decision Processes (POMDPs), where the agent learns a decoder function
that maps a series of high-dimensional raw observations to a compact
representation and uses it for more efficient exploration and planning.
We focus our attention on the sub-classes of \textit{$\gamma$-observable} and
\textit{decodable POMDPs}, for which it has been shown that statistically
tractable learning is possible, but there has not been any computationally
efficient algorithm. We first present an algorithm for decodable POMDPs that
combines maximum likelihood estimation (MLE) and optimism in the face of
uncertainty (OFU) to perform representation learning and achieve efficient
sample complexity, while only calling supervised learning computational
oracles. We then show how to adapt this algorithm to also work in the broader
class of $\gamma$-observable POMDPs.
Related papers
- Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Learning in POMDPs is Sample-Efficient with Hindsight Observability [36.66596305441365]
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.
In many realistic problems, more information is either revealed or can be computed during some point of the learning process.
We formulate a setting (setshort) as a POMDP where the latent states are revealed to the learner in hindsight and only during training.
arXiv Detail & Related papers (2023-01-31T18:54:36Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.