Twice regularized MDPs and the equivalence between robustness and
regularization
- URL: http://arxiv.org/abs/2110.06267v1
- Date: Tue, 12 Oct 2021 18:33:45 GMT
- Title: Twice regularized MDPs and the equivalence between robustness and
regularization
- Authors: Esther Derman, Matthieu Geist, Shie Mannor
- Abstract summary: We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
- Score: 65.58188361659073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially
known system dynamics. To solve them, one typically resorts to robust
optimization methods. However, this significantly increases computational
complexity and limits scalability in both learning and planning. On the other
hand, regularized MDPs show more stability in policy learning without impairing
time complexity. Yet, they generally do not encompass uncertainty in the model
dynamics. In this work, we aim to learn robust MDPs using regularization. We
first show that regularized MDPs are a particular instance of robust MDPs with
uncertain reward. We thus establish that policy iteration on reward-robust MDPs
can have the same time complexity as on regularized MDPs. We further extend
this relationship to MDPs with uncertain transitions: this leads to a
regularization term with an additional dependence on the value function. We
finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs),
i.e., MDPs with $\textit{both}$ value and policy regularization. The
corresponding Bellman operators enable developing policy iteration schemes with
convergence and robustness guarantees. It also reduces planning and learning in
robust MDPs to regularized MDPs.
Related papers
- The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z) - A Relation Analysis of Markov Decision Process Frameworks [26.308541799686505]
We study the relation between different Decision Process (MDP) frameworks in the machine learning and econometrics literature.
We show that the entropy-regularized MDP is equivalent to a MDP model, and is strictly subsumed by the general regularized MDP.
arXiv Detail & Related papers (2020-08-18T09:27:26Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.