Related papers: Near-Optimal Learning and Planning in Separated Latent MDPs

Near-Optimal Learning and Planning in Separated Latent MDPs

URL: http://arxiv.org/abs/2406.07920v1
Date: Wed, 12 Jun 2024 06:41:47 GMT
Title: Near-Optimal Learning and Planning in Separated Latent MDPs
Authors: Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin,
Abstract summary: We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs) In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
Score: 70.88315649628251
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statistical threshold* for the horizon length necessary for efficient learning. On the computational side, we show that under a weaker assumption of separability under the optimal policy, there is a quasi-polynomial algorithm with time complexity scaling in terms of the statistical threshold. We further show a near-matching time complexity lower bound under the exponential time hypothesis.

Related papers

Learning Intersections of Two Margin Halfspaces under Factorizable Distributions [56.51474048985742]
Learning intersections of halfspaces is a central problem in Computational Learning Theory.<n>Even for just two halfspaces, it remains a major open question whether learning is possible in time.<n>We introduce a novel algorithm that provably circumvents the CSQ hardness barrier.
arXiv Detail & Related papers (2025-11-13T00:28:24Z)
Scalable Policy-Based RL Algorithms for POMDPs [6.2229686397601585]
We consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem by approximating a POMDP model into a finite-state Markov Decision Process (MDP)<n>We show that the approximation error decreases exponentially with the length of this history.<n>To the best of our knowledge, our finite-time bounds are the first to explicitly quantify the error introduced when applying standard TD learning to a setting where the true dynamics are not Markovian.
arXiv Detail & Related papers (2025-10-08T00:33:38Z)
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models [20.314426291330278]
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution.
arXiv Detail & Related papers (2024-04-23T21:02:58Z)
Towards Instance-Optimality in Online PAC Reinforcement Learning [28.156332484814616]
We propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy. We demonstrate that the sample complexity of the PEDEL algorithm of citeWagenmaker22linearMDP closely approaches this lower bound.
arXiv Detail & Related papers (2023-10-31T19:26:36Z)
Posterior Sampling-based Online Learning for Episodic POMDPs [5.797837329787459]
We consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs.
arXiv Detail & Related papers (2023-10-16T06:41:13Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid. We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z)
Continual Learning In Environments With Polynomial Mixing Times [13.533984338434106]
We study the effect of mixing times on learning in continual reinforcement learning. We propose a family of model-based algorithms that speed up learning by directly optimizing for the average reward.
arXiv Detail & Related papers (2021-12-13T23:41:56Z)
RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP) In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs) We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.