Related papers: Learning Verifiable Control Policies Using Relaxed Verification

Learning Verifiable Control Policies Using Relaxed Verification

URL: http://arxiv.org/abs/2504.16879v1
Date: Wed, 23 Apr 2025 16:54:35 GMT
Title: Learning Verifiable Control Policies Using Relaxed Verification
Authors: Puja Chaudhury, Alexander Estornell, Michael Everett,
Abstract summary: This work proposes to perform verification throughout training to aim for policies whose properties can be evaluated throughout runtime.<n>The approach is to use differentiable reachability analysis and incorporate new components into the loss function.
Score: 49.81690518952909
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To provide safety guarantees for learning-based control systems, recent work has developed formal verification methods to apply after training ends. However, if the trained policy does not meet the specifications, or there is conservatism in the verification algorithm, establishing these guarantees may not be possible. Instead, this work proposes to perform verification throughout training to ultimately aim for policies whose properties can be evaluated throughout runtime with lightweight, relaxed verification algorithms. The approach is to use differentiable reachability analysis and incorporate new components into the loss function. Numerical experiments on a quadrotor model and unicycle model highlight the ability of this approach to lead to learned control policies that satisfy desired reach-avoid and invariance specifications.

Related papers

SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL [54.022106606140774]
We present theoretical results that place a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setting.<n>This bound can be applied to temporally-extended properties (beyond safety) and to robust control problems.<n>We present experimental results demonstrating this trade-off and comparing the theoretical bound to posterior bounds derived from empirical violation rates.
arXiv Detail & Related papers (2025-04-08T19:09:07Z)
Neural Control and Certificate Repair via Runtime Monitoring [7.146556437126553]
We propose a novel framework that utilizes runtime monitoring to detect system behaviors that violate the property of interest.<n>We demonstrate the effectiveness of our approach by using it to repair and to boost the safety rate of neural network policies learned.
arXiv Detail & Related papers (2024-12-17T15:15:30Z)
Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples. Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z)
Verification-Aided Learning of Neural Network Barrier Functions with Termination Guarantees [6.9060054915724]
Barrier functions are a general framework for establishing a safety guarantee for a system. There is no general method for finding these functions. Recent approaches use self-supervised learning techniques to learn these functions.
arXiv Detail & Related papers (2024-03-12T04:29:43Z)
Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory [46.85103495283037]
We propose a new approach to apply verification methods from control theory to learned value functions. We formalize original theorems that establish links between value functions and control barrier functions. Our work marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems.
arXiv Detail & Related papers (2023-06-06T21:41:31Z)
Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees [20.045860624444494]
We study the problem of learning controllers for discrete-time non-linear dynamical systems with formal reach-avoid guarantees. We learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy.
arXiv Detail & Related papers (2022-10-11T10:02:49Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space. We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z)
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties. We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Runtime-Safety-Guided Policy Repair [13.038017178545728]
We study the problem of policy repair for learning-based control policies in safety-critical settings. We propose to reduce or even eliminate control switching by repairing' the trained policy based on runtime data produced by the safety controller.
arXiv Detail & Related papers (2020-08-17T23:31:48Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.