Related papers: Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

URL: http://arxiv.org/abs/2108.02701v1
Date: Thu, 5 Aug 2021 16:10:28 GMT
Title: Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty
Authors: Reazul Hasan Russel, Mouhacine Benosman, Jeroen Van Baar, Radu Corcodel
Abstract summary: We propose to unite two frameworks resulting in robust constrained MDPs (RCMDPs) The motivation is to develop a framework that can satisfy safety constraints while also simultaneously offer robustness to model uncertainties. We also independently propose Lyapunov based reward shaping for RCMDPs, yielding better stability and convergence properties.
Score: 9.597991494047871
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safety and robustness are two desired properties for any reinforcement learning algorithm. CMDPs can handle additional safety constraints and RMDPs can perform well under model uncertainties. In this paper, we propose to unite these two frameworks resulting in robust constrained MDPs (RCMDPs). The motivation is to develop a framework that can satisfy safety constraints while also simultaneously offer robustness to model uncertainties. We develop the RCMDP objective, derive gradient update formula to optimize this objective and then propose policy gradient based algorithms. We also independently propose Lyapunov based reward shaping for RCMDPs, yielding better stability and convergence properties.

Related papers

Efficient Solution and Learning of Robust Factored MDPs [57.2416302384766]
Learning r-MDPs from interactions with an unknown environment enables the synthesis of robust policies with provable guarantees on performance.<n>We propose novel methods for solving and learning r-MDPs based on factored state representations.
arXiv Detail & Related papers (2025-08-01T15:23:15Z)
An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints [7.275101606364466]
Online safe reinforcement learning (RL) plays a key role in dynamic environments, with applications in autonomous driving, robotics, and cybersecurity.<n>The objective is to learn optimal policies that maximize rewards while satisfying safety constraints modeled by constrained Markov decision processes (CMDPs)<n>Existing methods achieve sublinear regret under constraints but often fail in adversarial settings, where constraints are unknown, time-varying, and potentially adversarially designed.<n>We propose the Optimistic Mirror Descent Primal-Dual (OMDPD) algorithm, the first to address online CMDPs with anytime adversarial constraints.
arXiv Detail & Related papers (2025-05-28T00:16:34Z)
Policy Gradient for Robust Markov Decision Processes [16.281897051782863]
This paper introduces a novel policy gradient method, Double-Loop Robust Policy Mirror Descent (MD), for solving robust Markov Decision Processes (MDPs) MD employs a general mirror descent update rule for the policy optimization with adaptive tolerance per iteration, guaranteeing convergence to a globally optimal policy. We provide a comprehensive analysis of MD, including new convergence results under both direct and softmax parameterizations, and provide novel insights into the inner problem solution through Transition Mirror Ascent (TMA)
arXiv Detail & Related papers (2024-10-29T15:16:02Z)
Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness [47.9744734181236]
We explore the concept of Lipschitz continuity to certify the robustness of deep neural networks (DNNs) against adversarial attacks. We propose a novel algorithm that remaps the input domain into a constrained range, reducing the Lipschitz constant and potentially enhancing robustness. Our method achieves the best robust accuracy for CIFAR10, CIFAR100, and ImageNet datasets on the RobustBench leaderboard.
arXiv Detail & Related papers (2024-06-28T03:10:36Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Bounded Robustness in Reinforcement Learning via Lexicographic Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost. We study how policies can be maximally robust to arbitrary observational noise. We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z)
Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust. We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings. We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z)
Sample Complexity of Robust Reinforcement Learning with a Generative Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z)
Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution. This approach poses a number of implementation and optimization challenges. We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty [9.246374019271935]
We propose to merge the theory of constrained Markov decision process (CMDP) with the theory of robust Markov decision process (RMDP) This formulation allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm.
arXiv Detail & Related papers (2020-10-10T01:53:37Z)
Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees [3.8073142980733]
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy.
arXiv Detail & Related papers (2020-06-20T16:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.