Dynamic Shielding for Reinforcement Learning in Black-Box Environments
- URL: http://arxiv.org/abs/2207.13446v1
- Date: Wed, 27 Jul 2022 10:54:05 GMT
- Title: Dynamic Shielding for Reinforcement Learning in Black-Box Environments
- Authors: Masaki Waga, Ezequiel Castellano, Sasinee Pruekprasert, Stefan
Klikovits, Toru Takisaka, and Ichiro Hasuo
- Abstract summary: It is challenging to use reinforcement learning in cyber-physical systems due to the lack of safety guarantees during learning.
This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge.
We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning.
- Score: 2.696933675395521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is challenging to use reinforcement learning (RL) in cyber-physical
systems due to the lack of safety guarantees during learning. Although there
have been various proposals to reduce undesired behaviors during learning, most
of these techniques require prior system knowledge, and their applicability is
limited. This paper aims to reduce undesired behaviors during learning without
requiring any prior system knowledge. We propose dynamic shielding: an
extension of a model-based safe RL technique called shielding using automata
learning. The dynamic shielding technique constructs an approximate system
model in parallel with RL using a variant of the RPNI algorithm and suppresses
undesired explorations due to the shield constructed from the learned model.
Through this combination, potentially unsafe actions can be foreseen before the
agent experiences them. Experiments show that our dynamic shield significantly
decreases the number of undesired events during training.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Model-based Dynamic Shielding for Safe and Efficient Multi-Agent
Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases.
Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - A Secure Learning Control Strategy via Dynamic Camouflaging for Unknown
Dynamical Systems under Attacks [0.0]
This paper presents a secure reinforcement learning (RL) based control method for unknown linear time-invariant cyber-physical systems (CPSs)
We consider the attack scenario where the attacker learns about the dynamic model during the exploration phase of the learning conducted by the designer.
We propose a dynamic camouflaging based attack-resilient reinforcement learning (ARRL) algorithm which can learn the desired optimal controller for the dynamic system.
arXiv Detail & Related papers (2021-02-01T00:34:38Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.