Related papers: Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

URL: http://arxiv.org/abs/2205.01053v1
Date: Fri, 29 Apr 2022 16:53:00 GMT
Title: Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes
Authors: Alessandro Ronca, Gabriel Paludo Licks, Giuseppe De Giacomo
Abstract summary: We show that Markov abstractions can be learned during reinforcement learning. We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees.
Score: 90.53326983143644
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Our work aims at developing reinforcement learning algorithms that do not rely on the Markov assumption. We consider the class of Non-Markov Decision Processes where histories can be abstracted into a finite set of states while preserving the dynamics. We call it a Markov abstraction since it induces a Markov Decision Process over a set of states that encode the non-Markov dynamics. This phenomenon underlies the recently introduced Regular Decision Processes (as well as POMDPs where only a finite number of belief states is reachable). In all such kinds of decision process, an agent that uses a Markov abstraction can rely on the Markov property to achieve optimal behaviour. We show that Markov abstractions can be learned during reinforcement learning. For these two tasks, any algorithms satisfying some basic requirements can be employed. We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees, and we also provide an experimental evaluation.

Related papers

Constructing Non-Markovian Decision Process via History Aggregator [0.9918339315515408]
We establish the category of Markov Decision Processes (MDP) and the category of non-Markovian Decision Processes (NMDP)<n>We introduce non-Markovianity into decision-making problem settings via the History Aggregator for State (HAS)<n>Our analysis demonstrates the effectiveness of our method in representing a broad range of non-Markovian dynamics.
arXiv Detail & Related papers (2025-06-30T16:32:31Z)
Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL). We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z)
Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP) We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z)
Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
PAC Reinforcement Learning Algorithm for General-Sum Markov Games [5.279475826661642]
The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC.
arXiv Detail & Related papers (2020-09-05T21:54:27Z)
Approximating Euclidean by Imprecise Markov Decision Processes [3.0017241250121383]
We investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations. We show that for cost functions over finite time horizons the approximations become arbitrarily precise.
arXiv Detail & Related papers (2020-06-26T11:58:04Z)
Learning Non-Markovian Reward Models in MDPs [0.0]
We show how to formalise the non-Markovian reward function using a Mealy machine. In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves. While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt.
arXiv Detail & Related papers (2020-01-25T10:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.