Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and
Generalization Guarantees
- URL: http://arxiv.org/abs/2201.08355v4
- Date: Sat, 1 Apr 2023 17:36:34 GMT
- Title: Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and
Generalization Guarantees
- Authors: Kai-Chieh Hsu, Allen Z. Ren, Duy Phuong Nguyen, Anirudha Majumdar,
Jaime F. Fisac
- Abstract summary: Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world.
We propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution.
- Score: 7.6347172725540995
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Safety is a critical component of autonomous systems and remains a challenge
for learning-based policies to be utilized in the real world. In particular,
policies learned using reinforcement learning often fail to generalize to novel
environments due to unsafe behavior. In this paper, we propose
Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically
guaranteed safety-aware policy distribution. To improve safety, we apply a dual
policy setup where a performance policy is trained using the cumulative task
reward and a backup (safety) policy is trained by solving the Safety Bellman
Equation based on Hamilton-Jacobi (HJ) reachability analysis. In Sim-to-Lab
transfer, we apply a supervisory control scheme to shield unsafe actions during
exploration; in Lab-to-Real transfer, we leverage the Probably Approximately
Correct (PAC)-Bayes framework to provide lower bounds on the expected
performance and safety of policies in unseen environments. Additionally,
inheriting from the HJ reachability analysis, the bound accounts for the
expectation over the worst-case safety in each environment. We empirically
study the proposed framework for ego-vision navigation in two types of indoor
environments with varying degrees of photorealism. We also demonstrate strong
generalization performance through hardware experiments in real indoor spaces
with a quadrupedal robot. See
https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary
material.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Safe Reinforcement Learning in a Simulated Robotic Arm [0.0]
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies.
In this paper, we extend the applicability of safe RL algorithms by creating a customized environment with Panda robotic arm.
arXiv Detail & Related papers (2023-11-28T19:22:16Z) - Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark [12.660770759420286]
We present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios.
We offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms.
arXiv Detail & Related papers (2023-10-19T08:19:28Z) - Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Fail-Safe Adversarial Generative Imitation Learning [9.594432031144716]
We propose a safety layer that enables a closed-form probability density/gradient of the safe generative continuous policy, end-to-end generative adversarial training, and worst-case safety guarantees.
The safety layer maps all actions into a set of safe actions, and uses the change-of-variables formula plus additivity of measures for the density.
In an experiment on real-world driver interaction data, we empirically demonstrate tractability, safety and imitation performance of our approach.
arXiv Detail & Related papers (2022-03-03T13:03:06Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.