An Efficient Solution to s-Rectangular Robust Markov Decision Processes
- URL: http://arxiv.org/abs/2301.13642v1
- Date: Tue, 31 Jan 2023 13:54:23 GMT
- Title: An Efficient Solution to s-Rectangular Robust Markov Decision Processes
- Authors: Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor
- Abstract summary: We present an efficient robust value iteration for texttts-rectangular robust Markov Decision Processes (MDPs)
We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma.
We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
- Score: 49.05403412954533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an efficient robust value iteration for \texttt{s}-rectangular
robust Markov Decision Processes (MDPs) with a time complexity comparable to
standard (non-robust) MDPs which is significantly faster than any existing
method. We do so by deriving the optimal robust Bellman operator in concrete
forms using our $L_p$ water filling lemma. We unveil the exact form of the
optimal policies, which turn out to be novel threshold policies with the
probability of playing an action proportional to its advantage.
Related papers
- Anytime-Constrained Reinforcement Learning [6.981971551979697]
We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints.
We show that there exist optimal deterministic policies augmented with cumulative costs.
We show that computing non-trivial approximately optimal policies is NP-hard in general.
arXiv Detail & Related papers (2023-11-09T16:51:26Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Robust Markov Decision Processes without Model Estimation [32.16801929347098]
There are two major barriers to applying robust MDPs in practice.
First, most works study robust MDPs in a model-based regime.
Second, prior work typically assumes a strong oracle to obtain the optimal solution.
arXiv Detail & Related papers (2023-02-02T17:29:10Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - Robust Average-Reward Markov Decision Processes [25.125481838479256]
We focus on robust average-reward MDPs, where the goal is to find a policy that optimize the worst-case average reward over an uncertainty set.
We take an approach that approximates average-reward MDPs using discounted MDPs.
We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value algorithm that provably finds its solution.
arXiv Detail & Related papers (2023-01-02T19:51:55Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Markov Decision Process modeled with Bandits for Sequential Decision
Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence.
We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix.
We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.