Related papers: Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

URL: http://arxiv.org/abs/2207.13446v1
Date: Wed, 27 Jul 2022 10:54:05 GMT
Title: Dynamic Shielding for Reinforcement Learning in Black-Box Environments
Authors: Masaki Waga, Ezequiel Castellano, Sasinee Pruekprasert, Stefan Klikovits, Toru Takisaka, and Ichiro Hasuo
Abstract summary: It is challenging to use reinforcement learning in cyber-physical systems due to the lack of safety guarantees during learning. This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge. We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning.
Score: 2.696933675395521
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is challenging to use reinforcement learning (RL) in cyber-physical systems due to the lack of safety guarantees during learning. Although there have been various proposals to reduce undesired behaviors during learning, most of these techniques require prior system knowledge, and their applicability is limited. This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge. We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning. The dynamic shielding technique constructs an approximate system model in parallel with RL using a variant of the RPNI algorithm and suppresses undesired explorations due to the shield constructed from the learned model. Through this combination, potentially unsafe actions can be foreseen before the agent experiences them. Experiments show that our dynamic shield significantly decreases the number of undesired events during training.

Related papers

Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning [10.797457293404468]
Safety stands as the primary obstacle preventing the widespread adoption of learning-based robotic systems in our daily lives.<n>In this work, we introduce a novel approach that simulates these temporal effects by applying continuous negative rewards without episode termination.<n>We present three CBF-based approaches, each integrating traditional RL methods with Control Barrier Functions, guiding the agent to learn safe behavior.
arXiv Detail & Related papers (2025-05-24T20:29:08Z)
Amortized Safe Active Learning for Real-Time Decision-Making: Pretrained Neural Policies from Simulated Nonparametric Functions [23.406516455945653]
Active Learning (AL) is a sequential learning approach aiming at selecting the most informative data for model training. Key challenges of AL are the repeated model training and acquisition optimization required for data selection. By leveraging a pretrained neural network policy, our method eliminates the need for repeated model training and acquisition optimization.
arXiv Detail & Related papers (2025-01-26T09:05:52Z)
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z)
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z)
Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations. We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z)
A Secure Learning Control Strategy via Dynamic Camouflaging for Unknown Dynamical Systems under Attacks [0.0]
This paper presents a secure reinforcement learning (RL) based control method for unknown linear time-invariant cyber-physical systems (CPSs) We consider the attack scenario where the attacker learns about the dynamic model during the exploration phase of the learning conducted by the designer. We propose a dynamic camouflaging based attack-resilient reinforcement learning (ARRL) algorithm which can learn the desired optimal controller for the dynamic system.
arXiv Detail & Related papers (2021-02-01T00:34:38Z)
Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space. NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior. We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget. We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control. We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.