Risk-Averse Stochastic Shortest Path Planning
- URL: http://arxiv.org/abs/2103.14727v1
- Date: Fri, 26 Mar 2021 20:49:14 GMT
- Title: Risk-Averse Stochastic Shortest Path Planning
- Authors: Mohamadreza Ahmadi, Anushri Dixit, Joel W. Burdick, and Aaron D. Ames
- Abstract summary: We show that optimal, stationary, Markovian policies exist and can be found via a special Bellman's equation.
A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.
- Score: 25.987787625028204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the stochastic shortest path planning problem in MDPs, i.e., the
problem of designing policies that ensure reaching a goal state from a given
initial state with minimum accrued cost. In order to account for rare but
important realizations of the system, we consider a nested dynamic coherent
risk total cost functional rather than the conventional risk-neutral total
expected cost. Under some assumptions, we show that optimal, stationary,
Markovian policies exist and can be found via a special Bellman's equation. We
propose a computational technique based on difference convex programs (DCPs) to
find the associated value functions and therefore the risk-averse policies. A
rover navigation MDP is used to illustrate the proposed methodology with
conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent
risk measures.
Related papers
- Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR [12.719528972742394]
We show that the risk-averse em total reward criterion can be optimized by a stationary policy.
Our results indicate that the total reward criterion may be preferable to the discounted criterion in a broad range of risk-averse reinforcement learning domains.
arXiv Detail & Related papers (2024-08-30T13:33:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Constrained Risk-Averse Markov Decision Processes [18.467950783426947]
We consider the problem of designing policies for Markov decision processes with dynamic coherent risk objectives and constraints.
We propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem.
We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints.
arXiv Detail & Related papers (2020-12-04T06:12:11Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Cautious Reinforcement Learning via Distributional Risk in the Dual
Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite.
We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z) - Reinforcement Learning of Risk-Constrained Policies in Markov Decision
Processes [5.081241420920605]
Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty.
We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes.
Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP.
arXiv Detail & Related papers (2020-02-27T13:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.