Partial Policy Iteration for L1-Robust Markov Decision Processes
- URL: http://arxiv.org/abs/2006.09484v1
- Date: Tue, 16 Jun 2020 19:50:14 GMT
- Title: Partial Policy Iteration for L1-Robust Markov Decision Processes
- Authors: Chin Pang Ho and Marek Petrik and Wolfram Wiesemann
- Abstract summary: This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
- Score: 13.555107578858307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust Markov decision processes (MDPs) allow to compute reliable solutions
for dynamic decision problems whose evolution is modeled by rewards and
partially-known transition probabilities. Unfortunately, accounting for
uncertainty in the transition probabilities significantly increases the
computational complexity of solving robust MDPs, which severely limits their
scalability. This paper describes new efficient algorithms for solving the
common class of robust MDPs with s- and sa-rectangular ambiguity sets defined
by weighted $L_1$ norms. We propose partial policy iteration, a new, efficient,
flexible, and general policy iteration scheme for robust MDPs. We also propose
fast methods for computing the robust Bellman operator in quasi-linear time,
nearly matching the linear complexity the non-robust Bellman operator. Our
experimental results indicate that the proposed methods are many orders of
magnitude faster than the state-of-the-art approach which uses linear
programming solvers combined with a robust value iteration.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - An Efficient Solution to s-Rectangular Robust Markov Decision Processes [49.05403412954533]
We present an efficient robust value iteration for texttts-rectangular robust Markov Decision Processes (MDPs)
We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma.
We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
arXiv Detail & Related papers (2023-01-31T13:54:23Z) - First-order Policy Optimization for Robust Markov Decision Process [40.2022466644885]
We consider the problem of solving robust Markov decision process (MDP)
MDP involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.
For $(mathbfs,mathbfa)$-rectangular uncertainty sets, we establish several structural observations on the robust objective.
arXiv Detail & Related papers (2022-09-21T18:10:28Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Robust Phi-Divergence MDPs [13.555107578858307]
We develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets.
We show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers.
arXiv Detail & Related papers (2022-05-27T19:08:55Z) - Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function.
$nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.