Risk-Averse MDPs under Reward Ambiguity
- URL: http://arxiv.org/abs/2301.01045v2
- Date: Wed, 4 Jan 2023 02:52:33 GMT
- Title: Risk-Averse MDPs under Reward Ambiguity
- Authors: Haolin Ruan, Zhi Chen and Chin Pang Ho
- Abstract summary: We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity.
A scalable first-order algorithm is designed to solve large-scale problems.
- Score: 9.929659318167731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a distributionally robust return-risk model for Markov decision
processes (MDPs) under risk and reward ambiguity. The proposed model optimizes
the weighted average of mean and percentile performances, and it covers the
distributionally robust MDPs and the distributionally robust chance-constrained
MDPs (both under reward ambiguity) as special cases. By considering that the
unknown reward distribution lies in a Wasserstein ambiguity set, we derive the
tractable reformulation for our model. In particular, we show that that the
return-risk model can also account for risk from uncertain transition kernel
when one only seeks deterministic policies, and that a distributionally robust
MDP under the percentile criterion can be reformulated as its nominal
counterpart at an adjusted risk level. A scalable first-order algorithm is
designed to solve large-scale problems, and we demonstrate the advantages of
our proposed model and algorithm through numerical experiments.
Related papers
- Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss.
To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure.
We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z) - Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively.
Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Decision-Dependent Distributionally Robust Markov Decision Process
Method in Dynamic Epidemic Control [4.644416582073023]
The Susceptible-Exposed-Infectious-Recovered (SEIR) model is widely used to represent the spread of infectious diseases.
We present a Distributionally Robust Markov Decision Process (DRMDP) approach for addressing the dynamic epidemic control problem.
arXiv Detail & Related papers (2023-06-24T20:19:04Z) - Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity [7.57543767554282]
This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure.
We derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method.
We also propose a sample-based offline learning algorithm, namely the robust fitted-Z iteration (RFZI)
arXiv Detail & Related papers (2023-06-20T15:51:25Z) - On the Variance, Admissibility, and Stability of Empirical Risk
Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates.
We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance.
We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Risk-aware Stochastic Shortest Path [0.0]
We treat the problem of risk-aware control for shortest path (SSP) on Markov decision processes (MDP)
We present an alternative view, instead optimizing conditional value-at-risk (CVaR), an established risk measure.
arXiv Detail & Related papers (2022-03-03T10:59:54Z) - Risk-Averse Bayes-Adaptive Reinforcement Learning [3.5289688061934963]
We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs)
We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherentity of MDPs.
Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
arXiv Detail & Related papers (2021-02-10T22:34:33Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.