Related papers: Learning and Solving Regular Decision Processes

Learning and Solving Regular Decision Processes

URL: http://arxiv.org/abs/2003.01008v1
Date: Mon, 2 Mar 2020 16:36:16 GMT
Title: Learning and Solving Regular Decision Processes
Authors: Eden Abadi, Ronen I. Brafman
Abstract summary: Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. We build on automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS.
Score: 15.533842336139067
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. The non-Markovian behavior is restricted to depend on regular properties of the history. These can be specified using regular expressions or formulas in linear dynamic logic over finite traces. Fully specified RDPs can be solved by compiling them into an appropriate MDP. Learning RDPs from data is a challenging problem that has yet to be addressed, on which we focus in this paper. Our approach rests on a new representation for RDPs using Mealy Machines that emit a distribution and an expected reward for each state-action pair. Building on this representation, we combine automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS to it. We empirically evaluate this approach, demonstrating its feasibility.

Related papers

Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs) Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally. Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z)
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. Regularized MDPs show more stability in policy learning without impairing time complexity. Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z)
BATS: Best Action Trajectory Stitching [22.75880303352508]
We introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics. We show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.
arXiv Detail & Related papers (2022-04-26T01:48:32Z)
Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ. We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments. We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z)
Solving the non-preemptive two queue polling model with generally distributed service and switch-over durations and Poisson arrivals as a Semi-Markov Decision Process [0.0]
The polling system with switch-over durations is a useful model with several practical applications. It is classified as a Discrete Event Dynamic System (DEDS) for which no one agreed upon modelling approach exists. This paper presents a Semi-Markov Decision Process (SMDP) formulation of the polling system as to introduce additional modelling power.
arXiv Detail & Related papers (2021-12-13T11:40:55Z)
Twice regularized MDPs and the equivalence between robustness and regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs. We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z)
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Learning Non-Markovian Reward Models in MDPs [0.0]
We show how to formalise the non-Markovian reward function using a Mealy machine. In our formal setting, we consider a Markov decision process (MDP) that models the dynamic of the environment in which the agent evolves. While the MDP is known by the agent, the reward function is unknown from the agent and must be learnt.
arXiv Detail & Related papers (2020-01-25T10:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.