Related papers: Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning

Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning

URL: http://arxiv.org/abs/2310.04288v1
Date: Fri, 6 Oct 2023 14:45:57 GMT
Title: Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning
Authors: Kristina Miller, Christopher K. Zeitler, William Shen, Kerianne Hobbs, Sayan Mitra, John Schierman, Mahesh Viswanathan
Abstract summary: runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it.
Score: 2.422636931175853
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.

Related papers

Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models [57.006252510102506]
Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but often lacks provable guarantees for safety-critical applications.<n>We introduce a novel recovery-based shielding framework that enables safe RL with a provable safety lower bound for unknown and non-linear continuous dynamical systems.
arXiv Detail & Related papers (2026-02-12T22:03:35Z)
Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL) We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z)
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems [15.863561935347692]
We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints. We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
arXiv Detail & Related papers (2024-03-06T19:39:20Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems. A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z)
Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem. We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.