Related papers: Regular Decision Processes for Grid Worlds

Regular Decision Processes for Grid Worlds

URL: http://arxiv.org/abs/2111.03647v2
Date: Tue, 9 Nov 2021 08:55:21 GMT
Title: Regular Decision Processes for Grid Worlds
Authors: Nicky Lenaers and Martijn van Otterlo
Abstract summary: We describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. We provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.

Related papers

Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z)
Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching [0.0]
We propose a comprehensive framework for policy gradient methods tailored to continuous time reinforcement learning. This is based on the connection between control problems and randomised problems, enabling applications across various classes of Markovian continuous time control problems.
arXiv Detail & Related papers (2024-04-27T15:41:06Z)
Model-Based Reinforcement Learning Control of Reaction-Diffusion Problems [0.0]
reinforcement learning has been applied to decision-making in several applications, most notably in games. We introduce two novel reward functions to drive the flow of the transported field. Results show that certain controls can be implemented successfully in these applications.
arXiv Detail & Related papers (2024-02-22T11:06:07Z)
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z)
Dynamic deep-reinforcement-learning algorithm in Partially Observed Markov Decision Processes [6.729108277517129]
This study shows the benefit of action sequence inclusion in order to solve Partially Observable Markov Decision Process. The developed algorithms showed enhanced robustness of controller performance against different types of external disturbances.
arXiv Detail & Related papers (2023-07-29T08:52:35Z)
Revisiting GANs by Best-Response Constraint: Perspective, Methodology, and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator. We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z)
Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$. In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially. Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z)
Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z)
Efficient PAC Reinforcement Learning in Regular Decision Processes [99.02383154255833]
We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in time in a set of parameters.
arXiv Detail & Related papers (2021-05-14T12:08:46Z)
Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant. Our approach relies on perturbeds, and can be used readily together with existing solvers. We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.