Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets
- URL: http://arxiv.org/abs/2602.05591v1
- Date: Thu, 05 Feb 2026 12:20:26 GMT
- Title: Efficient Algorithms for Robust Markov Decision Processes with $s$-Rectangular Ambiguity Sets
- Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann,
- Abstract summary: We develop a unified solution framework for a class of robust MDPs with $s$-rectangular ambiguity sets.<n>Using our algorithms, we show that $s$-rectangular robust MDPs with $1$- and $2$-norm as well as $$-divergence ambiguity sets can be solved several orders of magnitude faster than with state-of-the-art commercial solvers.
- Score: 19.053062263045664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust Markov decision processes (MDPs) have attracted significant interest due to their ability to protect MDPs from poor out-of-sample performance in the presence of ambiguity. In contrast to classical MDPs, which account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, a robust MDP additionally accounts for ambiguity by optimizing against the most adverse transition kernel from an ambiguity set constructed via historical data. In this paper, we develop a unified solution framework for a broad class of robust MDPs with $s$-rectangular ambiguity sets, where the most adverse transition probabilities are considered independently for each state. Using our algorithms, we show that $s$-rectangular robust MDPs with $1$- and $2$-norm as well as $φ$-divergence ambiguity sets can be solved several orders of magnitude faster than with state-of-the-art commercial solvers, and often only a logarithmic factor slower than classical MDPs. We demonstrate the favorable scaling properties of our algorithms on a range of synthetically generated as well as standard benchmark instances.
Related papers
- Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs [7.694676064731969]
We show that a robust policy iteration runs in strongly-polynomial time for $(s, a)$-rectangular $L_infty$ RMDPs with a constant (fixed) discount factor.<n>The existence of Markov and strongly-polynomial time algorithms is a fundamental problem for these optimization models.
arXiv Detail & Related papers (2026-01-30T17:57:07Z) - Solving Robust Markov Decision Processes: Generic, Reliable, Efficient [3.789219860006095]
Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities.<n>We provide a framework for solving RMDPs that is generic, reliable, and efficient.<n>Our prototype implementation outperforms existing tools by several orders of magnitude.
arXiv Detail & Related papers (2024-12-13T14:55:48Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [71.59406356321101]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.<n>We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - An Efficient Solution to s-Rectangular Robust Markov Decision Processes [49.05403412954533]
We present an efficient robust value iteration for texttts-rectangular robust Markov Decision Processes (MDPs)
We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma.
We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
arXiv Detail & Related papers (2023-01-31T13:54:23Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Robust Phi-Divergence MDPs [13.555107578858307]
We develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets.
We show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers.
arXiv Detail & Related papers (2022-05-27T19:08:55Z) - An Adaptive State Aggregation Algorithm for Markov Decision Processes [10.494611365482028]
We propose an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values.
Our algorithm converges almost surely to within (2varepsilon / (1 - gamma) of the true optimal value in the (ellinfty) norm, where (gamma) is the discount factor and aggregated states differ by at most (varepsilon)
arXiv Detail & Related papers (2021-07-23T07:19:43Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.