Value Functions are Control Barrier Functions: Verification of Safe
Policies using Control Theory
- URL: http://arxiv.org/abs/2306.04026v4
- Date: Tue, 5 Dec 2023 10:47:31 GMT
- Title: Value Functions are Control Barrier Functions: Verification of Safe
Policies using Control Theory
- Authors: Daniel C.H. Tan and Fernando Acero and Robert McCarthy and Dimitrios
Kanoulas and Zhibin Li
- Abstract summary: We propose a new approach to apply verification methods from control theory to learned value functions.
We formalize original theorems that establish links between value functions and control barrier functions.
Our work marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems.
- Score: 46.85103495283037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Guaranteeing safe behaviour of reinforcement learning (RL) policies poses
significant challenges for safety-critical applications, despite RL's
generality and scalability. To address this, we propose a new approach to apply
verification methods from control theory to learned value functions. By
analyzing task structures for safety preservation, we formalize original
theorems that establish links between value functions and control barrier
functions. Further, we propose novel metrics for verifying value functions in
safe control tasks and practical implementation details to improve learning.
Our work presents a novel method for certificate learning, which unlocks a
diversity of verification techniques from control theory for RL policies, and
marks a significant step towards a formal framework for the general, scalable,
and verifiable design of RL-based control systems. Code and videos are
available at this https url: https://rl-cbf.github.io/
Related papers
- Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems [2.126171264016785]
We propose Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration.
RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state.
In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL.
arXiv Detail & Related papers (2024-04-23T16:35:14Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z) - Model-Based Safe Reinforcement Learning with Time-Varying State and
Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints.
A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely.
The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z) - Joint Synthesis of Safety Certificate and Safe Control Policy using
Constrained Reinforcement Learning [7.658716383823426]
A valid safety certificate is an energy function indicating that safe states are with low energy.
Existing learning-based studies treat the safety certificate and the safe control policy as prior knowledge to learn the other.
This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL.
arXiv Detail & Related papers (2021-11-15T12:05:44Z) - Safe RAN control: A Symbolic Reinforcement Learning Approach [62.997667081978825]
We present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications.
We provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology.
We introduce a user interface (UI) developed to help a user set intent specifications to the system, and inspect the difference in agent proposed actions.
arXiv Detail & Related papers (2021-06-03T16:45:40Z) - Safe Reinforcement Learning Using Robust Action Governor [6.833157102376731]
Reinforcement Learning (RL) is essentially a trial-and-error learning procedure which may cause unsafe behavior during the exploration-and-exploitation process.
In this paper, we introduce a framework for safe RL that is based on integration of an RL algorithm with an add-on safety supervision module.
We illustrate this proposed safe RL framework through an application to automotive adaptive cruise control.
arXiv Detail & Related papers (2021-02-21T16:50:17Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.