Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization
under Model Uncertainty
- URL: http://arxiv.org/abs/2010.04870v1
- Date: Sat, 10 Oct 2020 01:53:37 GMT
- Title: Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization
under Model Uncertainty
- Authors: Reazul Hasan Russel and Mouhacine Benosman and Jeroen Van Baar
- Abstract summary: We propose to merge the theory of constrained Markov decision process (CMDP) with the theory of robust Markov decision process (RMDP)
This formulation allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees.
We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm.
- Score: 9.246374019271935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the problem of robustifying reinforcement learning
(RL) algorithms with respect to model uncertainties. Indeed, in the framework
of model-based RL, we propose to merge the theory of constrained Markov
decision process (CMDP), with the theory of robust Markov decision process
(RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This
formulation, simple in essence, allows us to design RL algorithms that are
robust in performance, and provides constraint satisfaction guarantees, with
respect to uncertainties in the system's states transition probabilities. The
need for RCMPDs is important for real-life applications of RL. For instance,
such formulation can play an important role for policy transfer from simulation
to real world (Sim2Real) in safety critical applications, which would benefit
from performance and safety guarantees which are robust w.r.t model
uncertainty. We first propose the general problem formulation under the concept
of RCMDP, and then propose a Lagrangian formulation of the optimal problem,
leading to a robust-constrained policy gradient RL algorithm. We finally
validate this concept on the inventory management problem.
Related papers
- Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk [23.63388546004777]
We analyze the robustness of CVaR-based risk-sensitive RL under Robust Markov Decision Processes.
Motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets.
arXiv Detail & Related papers (2024-05-02T20:28:49Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable
Policy Optimization under Model Uncertainty [9.597991494047871]
We propose to unite two frameworks resulting in robust constrained MDPs (RCMDPs)
The motivation is to develop a framework that can satisfy safety constraints while also simultaneously offer robustness to model uncertainties.
We also independently propose Lyapunov based reward shaping for RCMDPs, yielding better stability and convergence properties.
arXiv Detail & Related papers (2021-08-05T16:10:28Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.