Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
Systems
- URL: http://arxiv.org/abs/2403.04007v1
- Date: Wed, 6 Mar 2024 19:39:20 GMT
- Title: Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
Systems
- Authors: Wesley A. Suttle, Vipul K. Sharma, Krishna C. Kosaraju, S.
Sivaranjani, Ji Liu, Vijay Gupta, Brian M. Sadler
- Abstract summary: We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems.
Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints.
We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
- Score: 15.863561935347692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop provably safe and convergent reinforcement learning (RL)
algorithms for control of nonlinear dynamical systems, bridging the gap between
the hard safety guarantees of control theory and the convergence guarantees of
RL theory. Recent advances at the intersection of control and RL follow a
two-stage, safety filter approach to enforcing hard safety constraints:
model-free RL is used to learn a potentially unsafe controller, whose actions
are projected onto safe sets prescribed, for example, by a control barrier
function. Though safe, such approaches lose any convergence guarantees enjoyed
by the underlying RL methods. In this paper, we develop a single-stage,
sampling-based approach to hard constraint satisfaction that learns RL
controllers enjoying classical convergence guarantees while satisfying hard
safety constraints throughout training and deployment. We validate the efficacy
of our approach in simulation, including safe control of a quadcopter in a
challenging obstacle avoidance problem, and demonstrate that it outperforms
existing benchmarks.
Related papers
- Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning [7.349727826230864]
We present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents.
The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function.
We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95% pm 9%$ cumulative reward.
arXiv Detail & Related papers (2024-05-04T20:59:06Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Stable and Safe Reinforcement Learning via a Barrier-Lyapunov
Actor-Critic Approach [1.8924647429604111]
Barrier-Lyapunov Actor-Critic (BLAC) framework helps maintain the aforementioned safety and stability for the system.
Additional backup controller is introduced in case the RL-based controller cannot provide valid control signals.
arXiv Detail & Related papers (2023-04-08T16:48:49Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Sample-efficient Safe Learning for Online Nonlinear Control with Control
Barrier Functions [35.9713619595494]
Reinforcement Learning and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks.
Given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks.
We propose a emphprovably efficient episodic safe learning framework for online control tasks.
arXiv Detail & Related papers (2022-07-29T00:54:35Z) - Model-Based Safe Reinforcement Learning with Time-Varying State and
Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints.
A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely.
The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control.
We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.