Efficient PAC Reinforcement Learning in Regular Decision Processes
- URL: http://arxiv.org/abs/2105.06784v1
- Date: Fri, 14 May 2021 12:08:46 GMT
- Title: Efficient PAC Reinforcement Learning in Regular Decision Processes
- Authors: Alessandro Ronca and Giuseppe De Giacomo
- Abstract summary: We study reinforcement learning in regular decision processes.
Our main contribution is to show that a near-optimal policy can be PAC-learned in time in a set of parameters.
- Score: 99.02383154255833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently regular decision processes have been proposed as a well-behaved form
of non-Markov decision process. Regular decision processes are characterised by
a transition function and a reward function that depend on the whole history,
though regularly (as in regular languages). In practice both the transition and
the reward functions can be seen as finite transducers. We study reinforcement
learning in regular decision processes. Our main contribution is to show that a
near-optimal policy can be PAC-learned in polynomial time in a set of
parameters that describe the underlying decision process. We argue that the
identified set of parameters is minimal and it reasonably captures the
difficulty of a regular decision process.
Related papers
- Omega-Regular Decision Processes [11.917126383341593]
We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead.
We present experimental results demonstrating the effectiveness of the proposed reduction.
arXiv Detail & Related papers (2023-12-14T01:58:51Z) - Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps.
Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation.
We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Regular Decision Processes for Grid Worlds [0.0]
We describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions.
We provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.
arXiv Detail & Related papers (2021-11-05T17:54:43Z) - Counterfactual Explanations in Sequential Decision Making Under
Uncertainty [27.763369810430653]
We develop methods to find counterfactual explanations for sequential decision making processes.
In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions.
We show that our algorithm finds can provide valuable insights to enhance decision making under uncertainty.
arXiv Detail & Related papers (2021-07-06T17:38:19Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.