Related papers: Online Shielding for Reinforcement Learning

Online Shielding for Reinforcement Learning

URL: http://arxiv.org/abs/2212.01861v1
Date: Sun, 4 Dec 2022 16:00:29 GMT
Title: Online Shielding for Reinforcement Learning
Authors: Bettina K\"onighofer, Julian Rudolf, Alexander Palmisano, Martin Tappler and Roderick Bloem
Abstract summary: We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
Score: 59.86192283565134
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. For any action, the shield computes the maximal probability to not violate the safety specification within the next $k$ steps when executing this action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent. Existing offline shielding approaches compute exhaustively the safety of all state-action combinations ahead of time, resulting in huge computation times and large memory consumption. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our approach is well suited for high-level planning problems where the time between decisions can be used for safety computations and it is sustainable for the agent to wait until these computations are finished. For our evaluation, we selected a 2-player version of the classical computer game SNAKE. The game represents a high-level planning problem that requires fast decisions and the multiplayer setting induces a large state space, which is computationally expensive to analyse exhaustively.

Related papers

Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Long-term Safe Reinforcement Learning with Binary Feedback [5.684409853507594]
Long-term Binary Safe RL (LoBiSaRL) is a safe RL algorithm for constrained Markov decision processes. LoBiSaRL guarantees the long-term safety constraint, with high probability. Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability.
arXiv Detail & Related papers (2024-01-08T10:07:31Z)
Safe POMDP Online Planning via Shielding [6.234405592444883]
Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks.
arXiv Detail & Related papers (2023-09-19T00:02:05Z)
Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z)
Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z)
Automata Learning meets Shielding [1.1417805445492082]
Safety is still one of the major research challenges in reinforcement learning (RL) In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach.
arXiv Detail & Related papers (2022-12-04T14:58:12Z)
Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs. We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z)
Safe Reinforcement Learning by Imagining the Near Future [37.0376099401243]
In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions. Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks.
arXiv Detail & Related papers (2022-02-15T23:28:24Z)
SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications. We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs) We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z)
Safe Reinforcement Learning with Linear Function Approximation [48.75026009895308]
We introduce safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with emphno safety violation, achieve a $tildemathcalOleft(kappasqrtd3H3Tright)$ regret, nearly matching
arXiv Detail & Related papers (2021-06-11T08:46:57Z)
Learning to Act Safely with Limited Exposure and Almost Sure Certainty [1.0323063834827415]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for exploratory trials. We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty.
arXiv Detail & Related papers (2021-05-18T18:05:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.