Safe Reinforcement Learning using Data-Driven Predictive Control
- URL: http://arxiv.org/abs/2211.11027v1
- Date: Sun, 20 Nov 2022 17:10:40 GMT
- Title: Safe Reinforcement Learning using Data-Driven Predictive Control
- Authors: Mahmoud Selim, Amr Alanwar, M. Watheq El-Kharashi, Hazem M. Abbas,
Karl H. Johansson
- Abstract summary: We propose a data-driven safety layer that acts as a filter for unsafe actions.
The safety layer penalizes the RL agent if the proposed action is unsafe and replaces it with the closest safe one.
In a simulation, we show that our method outperforms state-of-the-art safe RL methods on the robotics navigation problem.
- Score: 0.5459797813771499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) algorithms can achieve state-of-the-art
performance in decision-making and continuous control tasks. However, applying
RL algorithms on safety-critical systems still needs to be well justified due
to the exploration nature of many RL algorithms, especially when the model of
the robot and the environment are unknown. To address this challenge, we
propose a data-driven safety layer that acts as a filter for unsafe actions.
The safety layer uses a data-driven predictive controller to enforce safety
guarantees for RL policies during training and after deployment. The RL agent
proposes an action that is verified by computing the data-driven reachability
analysis. If there is an intersection between the reachable set of the robot
using the proposed action, we call the data-driven predictive controller to
find the closest safe action to the proposed unsafe action. The safety layer
penalizes the RL agent if the proposed action is unsafe and replaces it with
the closest safe one. In the simulation, we show that our method outperforms
state-of-the-art safe RL methods on the robotics navigation problem for a
Turtlebot 3 in Gazebo and a quadrotor in Unreal Engine 4 (UE4).
Related papers
- Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning [7.349727826230864]
We present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents.
The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function.
We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95% pm 9%$ cumulative reward.
arXiv Detail & Related papers (2024-05-04T20:59:06Z) - Guided Online Distillation: Promoting Safe Reinforcement Learning by
Offline Demonstration [75.51109230296568]
We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue.
We propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework.
GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms.
arXiv Detail & Related papers (2023-09-18T00:22:59Z) - Reinforcement Learning for Safe Robot Control using Control Lyapunov
Barrier Functions [9.690491406456307]
Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots.
This paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data.
We also proposed the Lyapunov barrier actor-critic (LBAC) to search for a controller that satisfies the data-based approximation of the safety and reachability conditions.
arXiv Detail & Related papers (2023-05-16T20:27:02Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL)
The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space.
In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safe and Efficient Reinforcement Learning Using
Disturbance-Observer-Based Control Barrier Functions [5.571154223075409]
This paper presents a method for safe and efficient reinforcement learning (RL) using disturbance observers (DOBs) and control barrier functions (CBFs)
Our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions.
Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning.
arXiv Detail & Related papers (2022-11-30T18:49:53Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - SafeRL-Kit: Evaluating Efficient Reinforcement Learning Methods for Safe
Autonomous Driving [12.925039760573092]
We release SafeRL-Kit to benchmark safe RL methods for autonomous driving tasks.
SafeRL-Kit contains several latest algorithms specific to zero-constraint-violation tasks, including Safety Layer, Recovery RL, off-policy Lagrangian method, and Feasible Actor-Critic.
We conduct a comparative evaluation of the above algorithms in SafeRL-Kit and shed light on their efficacy for safe autonomous driving.
arXiv Detail & Related papers (2022-06-17T03:23:51Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.