Markov Abstractions for PAC Reinforcement Learning in Non-Markov
Decision Processes
- URL: http://arxiv.org/abs/2205.01053v1
- Date: Fri, 29 Apr 2022 16:53:00 GMT
- Title: Markov Abstractions for PAC Reinforcement Learning in Non-Markov
Decision Processes
- Authors: Alessandro Ronca, Gabriel Paludo Licks, Giuseppe De Giacomo
- Abstract summary: We show that Markov abstractions can be learned during reinforcement learning.
We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees.
- Score: 90.53326983143644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our work aims at developing reinforcement learning algorithms that do not
rely on the Markov assumption. We consider the class of Non-Markov Decision
Processes where histories can be abstracted into a finite set of states while
preserving the dynamics. We call it a Markov abstraction since it induces a
Markov Decision Process over a set of states that encode the non-Markov
dynamics. This phenomenon underlies the recently introduced Regular Decision
Processes (as well as POMDPs where only a finite number of belief states is
reachable). In all such kinds of decision process, an agent that uses a Markov
abstraction can rely on the Markov property to achieve optimal behaviour. We
show that Markov abstractions can be learned during reinforcement learning. For
these two tasks, any algorithms satisfying some basic requirements can be
employed. We show that our approach has PAC guarantees when the employed
algorithms have PAC guarantees, and we also provide an experimental evaluation.
Related papers
- Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).
We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z) - Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP)
We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Markov Decision Process modeled with Bandits for Sequential Decision
Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence.
We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix.
We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z) - Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation.
We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning.
Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - PAC Reinforcement Learning Algorithm for General-Sum Markov Games [5.279475826661642]
The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games.
In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC.
arXiv Detail & Related papers (2020-09-05T21:54:27Z) - Approximating Euclidean by Imprecise Markov Decision Processes [3.0017241250121383]
We investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations.
We show that for cost functions over finite time horizons the approximations become arbitrarily precise.
arXiv Detail & Related papers (2020-06-26T11:58:04Z) - Learning Non-Markovian Reward Models in MDPs [0.0]
We show how to formalise the non-Markovian reward function using a Mealy machine.
In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves.
While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt.
arXiv Detail & Related papers (2020-01-25T10:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.