Robust Markov Decision Processes without Model Estimation
- URL: http://arxiv.org/abs/2302.01248v2
- Date: Tue, 12 Sep 2023 16:20:15 GMT
- Title: Robust Markov Decision Processes without Model Estimation
- Authors: Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang
- Abstract summary: There are two major barriers to applying robust MDPs in practice.
First, most works study robust MDPs in a model-based regime.
Second, prior work typically assumes a strong oracle to obtain the optimal solution.
- Score: 32.16801929347098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust Markov Decision Processes (MDPs) are receiving much attention in
learning a robust policy which is less sensitive to environment changes. There
are an increasing number of works analyzing sample-efficiency of robust MDPs.
However, there are two major barriers to applying robust MDPs in practice.
First, most works study robust MDPs in a model-based regime, where the
transition probability needs to be estimated and requires a large amount of
memories $\mathcal{O}(|\mathcal{S}|^2|\mathcal{A}|)$. Second, prior work
typically assumes a strong oracle to obtain the optimal solution as an
intermediate step to solve robust MDPs. However, in practice, such an oracle
does not exist usually. To remove the oracle, we transform the original robust
MDPs into an alternative form, which allows us to use stochastic gradient
methods to solve the robust MDPs. Moreover, we prove the alternative form still
plays a similar role as the original form. With this new formulation, we devise
a sample-efficient algorithm to solve the robust MDPs in a model-free regime,
which does not require an oracle and trades off a lower storage requirement
$\mathcal{O}(|\mathcal{S}||\mathcal{A}|)$ with being able to generate samples
from a generative model or Markovian chain. Finally, we validate our
theoretical findings via numerical experiments, showing the efficiency with the
alternative form of robust MDPs.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Model-Free Robust Average-Reward Reinforcement Learning [25.125481838479256]
We focus on the robust average-reward MDPs under the model-free iteration setting.
We design two model-free algorithms, robust relative value (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution.
arXiv Detail & Related papers (2023-05-17T18:19:23Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - An Efficient Solution to s-Rectangular Robust Markov Decision Processes [49.05403412954533]
We present an efficient robust value iteration for texttts-rectangular robust Markov Decision Processes (MDPs)
We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma.
We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
arXiv Detail & Related papers (2023-01-31T13:54:23Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Robust Phi-Divergence MDPs [13.555107578858307]
We develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets.
We show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers.
arXiv Detail & Related papers (2022-05-27T19:08:55Z) - CP-MDP: A CANDECOMP-PARAFAC Decomposition Approach to Solve a Markov
Decision Process Multidimensional Problem [21.79259092920586]
We develop an MDP solver for a multidimensional problem using a tensor decomposition method.
We show that our approach can compute much larger problems using substantially less memory.
arXiv Detail & Related papers (2021-02-27T21:33:19Z) - Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs.
We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs.
Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z) - Plannable Approximations to MDP Homomorphisms: Equivariance under
Actions [72.30921397899684]
We introduce a contrastive loss function that enforces action equivariance on the learned representations.
We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process.
We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP.
arXiv Detail & Related papers (2020-02-27T08:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.