Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees
- URL: http://arxiv.org/abs/2301.13375v2
- Date: Thu, 28 Mar 2024 16:08:43 GMT
- Title: Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees
- Authors: James Queeney, Erhan Can Ozcan, Ioannis Ch. Paschalidis, Christos G. Cassandras,
- Abstract summary: We introduce a safe reinforcement learning framework that incorporates robustness through the use of an optimal transport cost uncertainty set.
In experiments on continuous control tasks with safety constraints, our approach demonstrates robust performance while significantly improving safety at deployment time.
- Score: 14.107064796593225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning. Real-world decision making applications require algorithms that can guarantee robust performance and safety in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In order to accomplish this goal, we introduce a safe reinforcement learning framework that incorporates robustness through the use of an optimal transport cost uncertainty set. We provide an efficient implementation based on applying Optimal Transport Perturbations to construct worst-case virtual state transitions, which does not impact data collection during training and does not require detailed simulator access. In experiments on continuous control tasks with safety constraints, our approach demonstrates robust performance while significantly improving safety at deployment time compared to standard safe reinforcement learning.
Related papers
- Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Searching for Optimal Runtime Assurance via Reachability and
Reinforcement Learning [2.422636931175853]
runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup controller.
Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations.
In this paper, we formulate the optimal RTA design problem and present a new approach for solving it.
arXiv Detail & Related papers (2023-10-06T14:45:57Z) - Iterative Reachability Estimation for Safe Reinforcement Learning [23.942701020636882]
We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained reinforcement learning (RL) environments.
In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety.
We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo.
arXiv Detail & Related papers (2023-09-24T02:36:42Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Adaptive control of a mechatronic system using constrained residual
reinforcement learning [0.0]
We propose a simple, practical and intuitive approach to improve the performance of a conventional controller in uncertain environments.
Our approach is motivated by the observation that conventional controllers in industrial motion control value robustness over adaptivity to deal with different operating conditions.
arXiv Detail & Related papers (2021-10-06T08:13:05Z) - Context-Aware Safe Reinforcement Learning for Non-Stationary
Environments [24.75527261989899]
Safety is a critical concern when deploying reinforcement learning agents for realistic tasks.
We propose the context-aware safe reinforcement learning (CASRL) method to realize safe adaptation in non-stationary environments.
Results show that the proposed algorithm significantly outperforms existing baselines in terms of safety and robustness.
arXiv Detail & Related papers (2021-01-02T23:52:22Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.