Do Androids Dream of Electric Fences? Safety-Aware Reinforcement
Learning with Latent Shielding
- URL: http://arxiv.org/abs/2112.11490v1
- Date: Tue, 21 Dec 2021 19:11:34 GMT
- Title: Do Androids Dream of Electric Fences? Safety-Aware Reinforcement
Learning with Latent Shielding
- Authors: Peter He, Borja G. Leon, Francesco Belardinelli
- Abstract summary: We present a novel approach to safety-aware deep reinforcement learning in high-dimensional environments called latent shielding.
Latent shielding leverages internal representations of the environment learnt by model-based agents to "imagine" future trajectories and avoid those deemed unsafe.
- Score: 18.54615448101203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing trend of fledgling reinforcement learning systems making their
way into real-world applications has been accompanied by growing concerns for
their safety and robustness. In recent years, a variety of approaches have been
put forward to address the challenges of safety-aware reinforcement learning;
however, these methods often either require a handcrafted model of the
environment to be provided beforehand, or that the environment is relatively
simple and low-dimensional. We present a novel approach to safety-aware deep
reinforcement learning in high-dimensional environments called latent
shielding. Latent shielding leverages internal representations of the
environment learnt by model-based agents to "imagine" future trajectories and
avoid those deemed unsafe. We experimentally demonstrate that this approach
leads to improved adherence to formally-defined safety specifications.
Related papers
- Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning [5.593642806259113]
We model a meta-learning process where each task is synchronized with a safeguard that monitors safety and provides a reward signal to the agent.
The design of the safeguard is manual but it is high-level and model-agnostic, which gives rise to an end-to-end safe learning approach.
We evaluate our framework in a Minecraft-inspired Gridworld, a VizDoom game environment, and an LLM fine-tuning application.
arXiv Detail & Related papers (2024-10-31T16:28:33Z) - ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Leveraging Approximate Model-based Shielding for Probabilistic Safety
Guarantees in Continuous Environments [63.053364805943026]
We extend the approximate model-based shielding framework to the continuous setting.
In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms.
arXiv Detail & Related papers (2024-02-01T17:55:08Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - State-Wise Safe Reinforcement Learning With Pixel Observations [12.338614299403305]
We propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions.
As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations.
We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return.
arXiv Detail & Related papers (2023-11-03T20:32:30Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Risk-Averse Model Uncertainty for Distributionally Robust Safe
Reinforcement Learning [3.9821399546174825]
We introduce a deep reinforcement learning framework for safe decision making in uncertain environments.
We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems.
In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
arXiv Detail & Related papers (2023-01-30T00:37:06Z) - Guiding Safe Exploration with Weakest Preconditions [15.469452301122177]
In reinforcement learning for safety-critical settings, it is desirable for the agent to obey safety constraints at all points in time.
We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem.
arXiv Detail & Related papers (2022-09-28T14:58:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.