Constrained Risk-Averse Markov Decision Processes
- URL: http://arxiv.org/abs/2012.02423v2
- Date: Sun, 28 Mar 2021 23:45:15 GMT
- Title: Constrained Risk-Averse Markov Decision Processes
- Authors: Mohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham, Richard M. Murray,
and Aaron D. Ames
- Abstract summary: We consider the problem of designing policies for Markov decision processes with dynamic coherent risk objectives and constraints.
We propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem.
We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints.
- Score: 18.467950783426947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of designing policies for Markov decision processes
(MDPs) with dynamic coherent risk objectives and constraints. We begin by
formulating the problem in a Lagrangian framework. Under the assumption that
the risk objectives and constraints can be represented by a Markov risk
transition mapping, we propose an optimization-based method to synthesize
Markovian policies that lower-bound the constrained risk-averse problem. We
demonstrate that the formulated optimization problems are in the form of
difference convex programs (DCPs) and can be solved by the disciplined
convex-concave programming (DCCP) framework. We show that these results
generalize linear programs for constrained MDPs with total discounted expected
costs and constraints. Finally, we illustrate the effectiveness of the proposed
method with numerical experiments on a rover navigation problem involving
conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent
risk measures.
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Domain Generalization without Excess Empirical Risk [83.26052467843725]
A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty.
We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization.
We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk.
arXiv Detail & Related papers (2023-08-30T08:46:46Z) - Regret Bounds for Markov Decision Processes with Recursive Optimized
Certainty Equivalents [3.8980564330208662]
We propose a new episodic risk-sensitive reinforcement learning formulation.
We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound.
Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.
arXiv Detail & Related papers (2023-01-30T01:22:31Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - A policy gradient approach for optimization of smooth risk measures [8.087699764574788]
We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward.
We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings.
arXiv Detail & Related papers (2022-02-22T17:26:28Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Risk Conditioned Neural Motion Planning [14.018786843419862]
Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks.
We propose an extension of soft actor critic model to estimate the execution risk of a plan through a risk critic.
We show the advantage of our model in terms of both computational time and plan quality, compared to a state-of-the-art mathematical programming baseline.
arXiv Detail & Related papers (2021-08-04T05:33:52Z) - Risk-Averse Stochastic Shortest Path Planning [25.987787625028204]
We show that optimal, stationary, Markovian policies exist and can be found via a special Bellman's equation.
A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures.
arXiv Detail & Related papers (2021-03-26T20:49:14Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.