Risk-aware Stochastic Shortest Path
- URL: http://arxiv.org/abs/2203.01640v1
- Date: Thu, 3 Mar 2022 10:59:54 GMT
- Title: Risk-aware Stochastic Shortest Path
- Authors: Tobias Meggendorfer
- Abstract summary: We treat the problem of risk-aware control for shortest path (SSP) on Markov decision processes (MDP)
We present an alternative view, instead optimizing conditional value-at-risk (CVaR), an established risk measure.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We treat the problem of risk-aware control for stochastic shortest path (SSP)
on Markov decision processes (MDP). Typically, expectation is considered for
SSP, which however is oblivious to the incurred risk. We present an alternative
view, instead optimizing conditional value-at-risk (CVaR), an established risk
measure. We treat both Markov chains as well as MDP and introduce, through
novel insights, two algorithms, based on linear programming and value
iteration, respectively. Both algorithms offer precise and provably correct
solutions. Evaluation of our prototype implementation shows that risk-aware
control is feasible on several moderately sized models.
Related papers
- Robust Stochastic Shortest-Path Planning via Risk-Sensitive Incremental Sampling [9.651071174735804]
We propose a risk-aware Rapidly-Exploring Random Trees (RRT*) planning algorithm for Shortest-Path (SSP) problems.
Our motivation rests on the step-wise coherence of the Conditional Value-at-Risk (CVaR) risk measure and the optimal substructure of the SSP problem.
Our simulation results show that incorporating risk into the tree growth process yields paths with lengths that are significantly less sensitive to variations in the noise parameter.
arXiv Detail & Related papers (2024-08-16T11:21:52Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - Regret Bounds for Markov Decision Processes with Recursive Optimized
Certainty Equivalents [3.8980564330208662]
We propose a new episodic risk-sensitive reinforcement learning formulation.
We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound.
Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
arXiv Detail & Related papers (2023-01-30T01:22:31Z) - Risk-Averse MDPs under Reward Ambiguity [9.929659318167731]
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity.
A scalable first-order algorithm is designed to solve large-scale problems.
arXiv Detail & Related papers (2023-01-03T11:06:30Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Risk-Averse Stochastic Shortest Path Planning [25.987787625028204]
We show that optimal, stationary, Markovian policies exist and can be found via a special Bellman's equation.
A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.
arXiv Detail & Related papers (2021-03-26T20:49:14Z) - Risk-Averse Bayes-Adaptive Reinforcement Learning [3.5289688061934963]
We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs)
We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherentity of MDPs.
Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
arXiv Detail & Related papers (2021-02-10T22:34:33Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.