Learning and Solving Regular Decision Processes
- URL: http://arxiv.org/abs/2003.01008v1
- Date: Mon, 2 Mar 2020 16:36:16 GMT
- Title: Learning and Solving Regular Decision Processes
- Authors: Eden Abadi, Ronen I. Brafman
- Abstract summary: Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards.
We build on automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS.
- Score: 15.533842336139067
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Regular Decision Processes (RDPs) are a recently introduced model that
extends MDPs with non-Markovian dynamics and rewards. The non-Markovian
behavior is restricted to depend on regular properties of the history. These
can be specified using regular expressions or formulas in linear dynamic logic
over finite traces. Fully specified RDPs can be solved by compiling them into
an appropriate MDP. Learning RDPs from data is a challenging problem that has
yet to be addressed, on which we focus in this paper. Our approach rests on a
new representation for RDPs using Mealy Machines that emit a distribution and
an expected reward for each state-action pair. Building on this representation,
we combine automata learning techniques with history clustering to learn such a
Mealy machine and solve it by adapting MCTS to it. We empirically evaluate this
approach, demonstrating its feasibility.
Related papers
- Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - BATS: Best Action Trajectory Stitching [22.75880303352508]
We introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset.
We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics.
We show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.
arXiv Detail & Related papers (2022-04-26T01:48:32Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Solving the non-preemptive two queue polling model with generally
distributed service and switch-over durations and Poisson arrivals as a
Semi-Markov Decision Process [0.0]
The polling system with switch-over durations is a useful model with several practical applications.
It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists.
This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power.
arXiv Detail & Related papers (2021-12-13T11:40:55Z) - Twice regularized MDPs and the equivalence between robustness and
regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z) - Sample Efficient Reinforcement Learning In Continuous State Spaces: A
Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions.
We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition.
We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Learning Non-Markovian Reward Models in MDPs [0.0]
We show how to formalise the non-Markovian reward function using a Mealy machine.
In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves.
While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt.
arXiv Detail & Related papers (2020-01-25T10:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.