Related papers: Partial Policy Iteration for L1-Robust Markov Decision Processes

Partial Policy Iteration for L1-Robust Markov Decision Processes

URL: http://arxiv.org/abs/2006.09484v1
Date: Tue, 16 Jun 2020 19:50:14 GMT
Title: Partial Policy Iteration for L1-Robust Markov Decision Processes
Authors: Chin Pang Ho and Marek Petrik and Wolfram Wiesemann
Abstract summary: This paper describes new efficient algorithms for solving the common class of robust MDPs. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
Score: 13.555107578858307
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describes new efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted $L_1$ norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operator. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach which uses linear programming solvers combined with a robust value iteration.

Related papers

Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process [1.17431678544333]
A key step in Bellman dynamic programming algorithms is the policy evaluation.<n>We develop an exact and efficient policy evaluation method based on this structure.<n>This yields a scalable solution applicable to both average and discounted reward MDPs.
arXiv Detail & Related papers (2025-08-01T17:49:27Z)
Efficient Solution and Learning of Robust Factored MDPs [57.2416302384766]
Learning r-MDPs from interactions with an unknown environment enables the synthesis of robust policies with provable guarantees on performance.<n>We propose novel methods for solving and learning r-MDPs based on factored state representations.
arXiv Detail & Related papers (2025-08-01T15:23:15Z)
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z)
Non-stationary Reinforcement Learning under General Function Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA. We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z)
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. Regularized MDPs show more stability in policy learning without impairing time complexity. Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z)
An Efficient Solution to s-Rectangular Robust Markov Decision Processes [49.05403412954533]
We present an efficient robust value iteration for texttts-rectangular robust Markov Decision Processes (MDPs) We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
arXiv Detail & Related papers (2023-01-31T13:54:23Z)
First-order Policy Optimization for Robust Markov Decision Process [40.2022466644885]
We consider the problem of solving robust Markov decision process (MDP) MDP involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. For $(mathbfs,mathbfa)$-rectangular uncertainty sets, we establish several structural observations on the robust objective.
arXiv Detail & Related papers (2022-09-21T18:10:28Z)
Efficient Policy Iteration for Robust Markov Decision Processes via Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known. Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs. In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z)
Robust Phi-Divergence MDPs [13.555107578858307]
We develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets. We show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers.
arXiv Detail & Related papers (2022-05-27T19:08:55Z)
Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function. $nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.