Robust Phi-Divergence MDPs
- URL: http://arxiv.org/abs/2205.14202v1
- Date: Fri, 27 May 2022 19:08:55 GMT
- Title: Robust Phi-Divergence MDPs
- Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
- Abstract summary: We develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets.
We show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers.
- Score: 13.555107578858307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, robust Markov decision processes (MDPs) have emerged as a
prominent modeling framework for dynamic decision problems affected by
uncertainty. In contrast to classical MDPs, which only account for
stochasticity by modeling the dynamics through a stochastic process with a
known transition kernel, robust MDPs additionally account for ambiguity by
optimizing in view of the most adverse transition kernel from a prescribed
ambiguity set. In this paper, we develop a novel solution framework for robust
MDPs with s-rectangular ambiguity sets that decomposes the problem into a
sequence of robust Bellman updates and simplex projections. Exploiting the rich
structure present in the simplex projections corresponding to phi-divergence
ambiguity sets, we show that the associated s-rectangular robust MDPs can be
solved substantially faster than with state-of-the-art commercial solvers as
well as a recent first-order solution scheme, thus rendering them attractive
alternatives to classical MDPs in practical applications.
Related papers
- Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Model-Free Robust Average-Reward Reinforcement Learning [25.125481838479256]
We focus on the robust average-reward MDPs under the model-free iteration setting.
We design two model-free algorithms, robust relative value (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution.
arXiv Detail & Related papers (2023-05-17T18:19:23Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Robust Markov Decision Processes without Model Estimation [32.16801929347098]
There are two major barriers to applying robust MDPs in practice.
First, most works study robust MDPs in a model-based regime.
Second, prior work typically assumes a strong oracle to obtain the optimal solution.
arXiv Detail & Related papers (2023-02-02T17:29:10Z) - On the convex formulations of robust Markov decision processes [12.100620186370012]
There is no known analog iteration of the MDP convex optimization formulation for solving RMDPs.
We derive a convex formulation with a number of variables and constraints in the number of states and actions, but with large coefficients in the constraints.
arXiv Detail & Related papers (2022-09-21T08:39:02Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - Twice regularized MDPs and the equivalence between robustness and
regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.