Online Learning for Unknown Partially Observable MDPs
- URL: http://arxiv.org/abs/2102.12661v1
- Date: Thu, 25 Feb 2021 03:36:13 GMT
- Title: Online Learning for Unknown Partially Observable MDPs
- Authors: Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar
- Abstract summary: We consider infinite-horizon average-cost POMDPs with unknown transition model, though known observation model.
We propose a natural posterior sampling-based reinforcement learning algorithm (POMDP-PSRL) and show that it achieves $O(T2/3)$ regret where $T$ is the time horizon.
- Score: 11.458853556386797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Solving Partially Observable Markov Decision Processes (POMDPs) is hard.
Learning optimal controllers for POMDPs when the model is unknown is harder.
Online learning of optimal controllers for unknown POMDPs, which requires
efficient learning using regret-minimizing algorithms that effectively tradeoff
exploration and exploitation, is even harder, and no solution exists currently.
In this paper, we consider infinite-horizon average-cost POMDPs with unknown
transition model, though known observation model. We propose a natural
posterior sampling-based reinforcement learning algorithm (POMDP-PSRL) and show
that it achieves $O(T^{2/3})$ regret where $T$ is the time horizon. To the best
of our knowledge, this is the first online RL algorithm for POMDPs and has
sub-linear regret.
Related papers
- Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Posterior Sampling-based Online Learning for Episodic POMDPs [5.797837329787459]
We consider the online learning problem for episodic POMDPs with unknown transition and observation models.
We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs.
arXiv Detail & Related papers (2023-10-16T06:41:13Z) - Learning Optimal Admission Control in Partially Observable Queueing
Networks [4.254099382808599]
We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network.
We show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$.
arXiv Detail & Related papers (2023-08-04T15:40:23Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with
Plug-in Solver [32.212146650873194]
We provide approaches to learn an RL model efficiently without the guidance of a reward signal.
In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase.
We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking $tildeO(d2H3/epsilon2)$ interactions with the environment.
arXiv Detail & Related papers (2021-10-07T07:59:50Z) - Sublinear Regret for Learning POMDPs [5.675955495285045]
We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs)
We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models.
We establish a regret bound of $O(T2/3sqrtlog T)$ for the proposed learning algorithm where $T$ is the learning horizon.
arXiv Detail & Related papers (2021-07-08T06:59:39Z) - Provably Efficient Reinforcement Learning for Discounted MDPs with
Feature Mapping [99.59319332864129]
In this paper, we study reinforcement learning for discounted Decision (MDP)
We propose a novel algorithm that makes use of the feature mapping and obtains a $tilde O(dsqrtT/ (1-gamma)2)$ regret.
Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a $ (1-gamma)-0.5$ factor.
arXiv Detail & Related papers (2020-06-23T17:08:54Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.