Linear Mixture Distributionally Robust Markov Decision Processes
- URL: http://arxiv.org/abs/2505.18044v1
- Date: Fri, 23 May 2025 15:48:11 GMT
- Title: Linear Mixture Distributionally Robust Markov Decision Processes
- Authors: Zhishuai Liu, Pan Xu,
- Abstract summary: The distributionally robust Markov decision process (DRMDP) addresses this challenge by finding a robust policy that performs well under the worst-case environment.<n>We propose a novel linear mixture DRMDP framework, where the nominal dynamics is assumed to be a linear mixture model.<n>We show that this new framework provides a more refined representation of uncertainties compared to conventional models.
- Score: 6.969949986864736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world decision-making problems face the off-dynamics challenge: the agent learns a policy in a source domain and deploys it in a target domain with different state transitions. The distributionally robust Markov decision process (DRMDP) addresses this challenge by finding a robust policy that performs well under the worst-case environment within a pre-specified uncertainty set of transition dynamics. Its effectiveness heavily hinges on the proper design of these uncertainty sets, based on prior knowledge of the dynamics. In this work, we propose a novel linear mixture DRMDP framework, where the nominal dynamics is assumed to be a linear mixture model. In contrast with existing uncertainty sets directly defined as a ball centered around the nominal kernel, linear mixture DRMDPs define the uncertainty sets based on a ball around the mixture weighting parameter. We show that this new framework provides a more refined representation of uncertainties compared to conventional models based on $(s,a)$-rectangularity and $d$-rectangularity, when prior knowledge about the mixture model is present. We propose a meta algorithm for robust policy learning in linear mixture DRMDPs with general $f$-divergence defined uncertainty sets, and analyze its sample complexities under three divergence metrics instantiations: total variation, Kullback-Leibler, and $\chi^2$ divergences. These results establish the statistical learnability of linear mixture DRMDPs, laying the theoretical foundation for future research on this new setting.
Related papers
- Robust Offline Reinforcement Learning for Non-Markovian Decision Processes [48.9399496805422]
We study the learning problem of robust offline non-Markovian RL.<n>We introduce a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set.<n>By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $epsilon$-optimal robust policy.
arXiv Detail & Related papers (2024-11-12T03:22:56Z) - Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration [0.6906005491572401]
We show that Information Bottleneck-based IRM achieves consistent calibration across different environments.
Our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated.
arXiv Detail & Related papers (2024-01-31T02:08:43Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - On the Foundation of Distributionally Robust Reinforcement Learning [19.621038847810198]
We contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL)
This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary.
Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP)
arXiv Detail & Related papers (2023-11-15T15:02:23Z) - Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity [7.57543767554282]
This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure.
We derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method.
We also propose a sample-based offline learning algorithm, namely the robust fitted-Z iteration (RFZI)
arXiv Detail & Related papers (2023-06-20T15:51:25Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Stochastic Modified Equations for Continuous Limit of Stochastic ADMM [13.694172299830315]
We put different variants of ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach.
We show that the dynamics of ADMM is approximated by a class of differential equations with small noise parameters in the sense of weak approximation.
arXiv Detail & Related papers (2020-03-07T08:01:50Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.