Runtime-Safety-Guided Policy Repair
- URL: http://arxiv.org/abs/2008.07667v1
- Date: Mon, 17 Aug 2020 23:31:48 GMT
- Title: Runtime-Safety-Guided Policy Repair
- Authors: Weichao Zhou, Ruihan Gao, BaekGyu Kim, Eunsuk Kang, Wenchao Li
- Abstract summary: We study the problem of policy repair for learning-based control policies in safety-critical settings.
We propose to reduce or even eliminate control switching by repairing' the trained policy based on runtime data produced by the safety controller.
- Score: 13.038017178545728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of policy repair for learning-based control policies in
safety-critical settings. We consider an architecture where a high-performance
learning-based control policy (e.g. one trained as a neural network) is paired
with a model-based safety controller. The safety controller is endowed with the
abilities to predict whether the trained policy will lead the system to an
unsafe state, and take over control when necessary. While this architecture can
provide added safety assurances, intermittent and frequent switching between
the trained policy and the safety controller can result in undesirable
behaviors and reduced performance. We propose to reduce or even eliminate
control switching by `repairing' the trained policy based on runtime data
produced by the safety controller in a way that deviates minimally from the
original policy. The key idea behind our approach is the formulation of a
trajectory optimization problem that allows the joint reasoning of policy
update and safety constraints. Experimental results demonstrate that our
approach is effective even when the system model in the safety controller is
unknown and only approximated.
Related papers
- Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems.
A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error.
While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Safe Reinforcement Learning with Chance-constrained Model Predictive
Control [10.992151305603267]
Real-world reinforcement learning (RL) problems often demand that agents behave safely by obeying a set of designed constraints.
We address the challenge of safe RL by coupling a safety guide based on model predictive control (MPC) with a modified policy gradient framework.
We show theoretically that this penalty allows for the safety guide to be removed after training and illustrate our method using experiments with a simulator quadrotor.
arXiv Detail & Related papers (2021-12-27T23:47:45Z) - Model-Based Safe Reinforcement Learning with Time-Varying State and
Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints.
A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely.
The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z) - Towards Safe Continuing Task Reinforcement Learning [21.390201009230246]
We propose an algorithm capable of operating in the continuing task setting without the need of restarts.
We evaluate our approach in a numerical example, which shows the capabilities of the proposed approach in learning safe policies via safe exploration.
arXiv Detail & Related papers (2021-02-24T22:12:25Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z) - Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control.
We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.