Risk-Averse Decision Making Under Uncertainty
- URL: http://arxiv.org/abs/2109.04082v1
- Date: Thu, 9 Sep 2021 07:52:35 GMT
- Title: Risk-Averse Decision Making Under Uncertainty
- Authors: Mohamadreza Ahmadi, Ugo Rosolia, Michel D. Ingham, Richard M. Murray,
and Aaron D. Ames
- Abstract summary: A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
- Score: 18.467950783426947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large class of decision making under uncertainty problems can be described
via Markov decision processes (MDPs) or partially observable MDPs (POMDPs),
with application to artificial intelligence and operations research, among
others. Traditionally, policy synthesis techniques are proposed such that a
total expected cost or reward is minimized or maximized. However, optimality in
the total expected cost sense is only reasonable if system behavior in the
large number of runs is of interest, which has limited the use of such policies
in practical mission-critical scenarios, wherein large deviations from the
expected behavior may lead to mission failure. In this paper, we consider the
problem of designing policies for MDPs and POMDPs with objectives and
constraints in terms of dynamic coherent risk measures, which we refer to as
the constrained risk-averse problem. For MDPs, we reformulate the problem into
a infsup problem via the Lagrangian framework and propose an optimization-based
method to synthesize Markovian policies. For MDPs, we demonstrate that the
formulated optimization problems are in the form of difference convex programs
(DCPs) and can be solved by the disciplined convex-concave programming (DCCP)
framework. We show that these results generalize linear programs for
constrained MDPs with total discounted expected costs and constraints. For
POMDPs, we show that, if the coherent risk measures can be defined as a Markov
risk transition mapping, an infinite-dimensional optimization can be used to
design Markovian belief-based policies. For stochastic finite-state controllers
(FSCs), we show that the latter optimization simplifies to a
(finite-dimensional) DCP and can be solved by the DCCP framework. We
incorporate these DCPs in a policy iteration algorithm to design risk-averse
FSCs for POMDPs.
Related papers
- Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form [26.01796404477275]
This paper presents the first algorithm capable of identifying a near-optimal policy in a robust constrained MDP (RCMDP)
An optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments.
arXiv Detail & Related papers (2024-08-29T06:37:16Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - A safe exploration approach to constrained Markov decision processes [7.036452261968767]
We consider discounted infinite horizon constrained Markov decision processes (CMDPs)
The goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints.
Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on developing a model-free and simulator-free algorithm.
arXiv Detail & Related papers (2023-12-01T13:16:39Z) - Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps.
Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation.
We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Constrained Risk-Averse Markov Decision Processes [18.467950783426947]
We consider the problem of designing policies for Markov decision processes with dynamic coherent risk objectives and constraints.
We propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem.
We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints.
arXiv Detail & Related papers (2020-12-04T06:12:11Z) - Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization
under Model Uncertainty [9.246374019271935]
We propose to merge the theory of constrained Markov decision process (CMDP) with the theory of robust Markov decision process (RMDP)
This formulation allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees.
We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm.
arXiv Detail & Related papers (2020-10-10T01:53:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.