Safe Reinforcement Learning via Shielding for POMDPs
- URL: http://arxiv.org/abs/2204.00755v1
- Date: Sat, 2 Apr 2022 03:51:55 GMT
- Title: Safe Reinforcement Learning via Shielding for POMDPs
- Authors: Steven Carr, Nils Jansen, Sebastian Junges and Ufuk Topcu
- Abstract summary: Reinforcement learning (RL) in safety-critical environments requires an agent to avoid decisions with catastrophic consequences.
We propose and thoroughly evaluate a tight integration of formally-verified shields for POMDPs with state-of-the-art deep RL algorithms.
We empirically demonstrate that an RL agent using a shield, beyond being safe, converges to higher values of expected reward.
- Score: 29.058332307331785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) in safety-critical environments requires an agent
to avoid decisions with catastrophic consequences. Various approaches
addressing the safety of RL exist to mitigate this problem. In particular,
so-called shields provide formal safety guarantees on the behavior of RL agents
based on (partial) models of the agents' environment. Yet, the state-of-the-art
generally assumes perfect sensing capabilities of the agents, which is
unrealistic in real-life applications. The standard models to capture scenarios
with limited sensing are partially observable Markov decision processes
(POMDPs). Safe RL for these models remains an open problem so far. We propose
and thoroughly evaluate a tight integration of formally-verified shields for
POMDPs with state-of-the-art deep RL algorithms and create an efficacious
method that safely learns policies under partial observability. We empirically
demonstrate that an RL agent using a shield, beyond being safe, converges to
higher values of expected reward. Moreover, shielded agents need an order of
magnitude fewer training episodes than unshielded agents, especially in
challenging sparse-reward settings.
Related papers
- A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot [0.0]
Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics.
This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF)
arXiv Detail & Related papers (2024-09-23T10:04:28Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Online Safety Property Collection and Refinement for Safe Deep
Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time.
CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties.
We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z) - Safe Model-Based Reinforcement Learning with an Uncertainty-Aware
Reachability Certificate [6.581362609037603]
We build a safe reinforcement learning framework to resolve constraints required by the DRC and its corresponding shield policy.
We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy.
arXiv Detail & Related papers (2022-10-14T06:16:53Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z) - Constraint-Guided Reinforcement Learning: Augmenting the
Agent-Environment-Interaction [10.203602318836445]
Reinforcement Learning (RL) agents have great successes in solving tasks with large observation and action spaces from limited feedback.
This paper discusses the engineering of reliable agents via the integration of deep RL with constraint-based augmentation models.
Our results show that constraint-guidance does both provide reliability improvements and safer behavior, as well as accelerated training.
arXiv Detail & Related papers (2021-04-24T10:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.