Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization
- URL: http://arxiv.org/abs/2303.06654v1
- Date: Sun, 12 Mar 2023 13:03:28 GMT
- Title: Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization
- Authors: Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor
- Abstract summary: Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
- Score: 64.60253456266872
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially
known system dynamics. To solve them, one typically resorts to robust
optimization methods. However, this significantly increases computational
complexity and limits scalability in both learning and planning. On the other
hand, regularized MDPs show more stability in policy learning without impairing
time complexity. Yet, they generally do not encompass uncertainty in the model
dynamics. In this work, we aim to learn robust MDPs using regularization. We
first show that regularized MDPs are a particular instance of robust MDPs with
uncertain reward. We thus establish that policy iteration on reward-robust MDPs
can have the same time complexity as on regularized MDPs. We further extend
this relationship to MDPs with uncertain transitions: this leads to a
regularization term with an additional dependence on the value function. We
then generalize regularized MDPs to twice regularized MDPs ($\text{R}^2$ MDPs),
i.e., MDPs with $\textit{both}$ value and policy regularization. The
corresponding Bellman operators enable us to derive planning and learning
schemes with convergence and generalization guarantees, thus reducing
robustness to regularization. We numerically show this two-fold advantage on
tabular and physical domains, highlighting the fact that $\text{R}^2$ preserves
its efficacy in continuous environments.
Related papers
- Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - Twice regularized MDPs and the equivalence between robustness and
regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z) - A Relation Analysis of Markov Decision Process Frameworks [26.308541799686505]
We study the relation between different Decision Process (MDP) frameworks in the machine learning and econometrics literature.
We show that the entropy-regularized MDP is equivalent to a MDP model, and is strictly subsumed by the general regularized MDP.
arXiv Detail & Related papers (2020-08-18T09:27:26Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.