Reinforcement Learning by Guided Safe Exploration
- URL: http://arxiv.org/abs/2307.14316v1
- Date: Wed, 26 Jul 2023 17:26:21 GMT
- Title: Reinforcement Learning by Guided Safe Exploration
- Authors: Qisong Yang, Thiago D. Sim\~ao, Nils Jansen, Simon H. Tindemans,
Matthijs T. J. Spaan
- Abstract summary: We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal.
This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal.
We also regularize a target policy towards the guide while the student is unreliable and gradually eliminate the influence of the guide.
- Score: 11.14908712905592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety is critical to broadening the application of reinforcement learning
(RL). Often, we train RL agents in a controlled environment, such as a
laboratory, before deploying them in the real world. However, the real-world
target task might be unknown prior to deployment. Reward-free RL trains an
agent without the reward to adapt quickly once the reward is revealed. We
consider the constrained reward-free setting, where an agent (the guide) learns
to explore safely without the reward signal. This agent is trained in a
controlled environment, which allows unsafe interactions and still provides the
safety signal. After the target task is revealed, safety violations are not
allowed anymore. Thus, the guide is leveraged to compose a safe behaviour
policy. Drawing from transfer learning, we also regularize a target policy (the
student) towards the guide while the student is unreliable and gradually
eliminate the influence of the guide as training progresses. The empirical
analysis shows that this method can achieve safe transfer learning and helps
the student solve the target task faster.
Related papers
- Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding [5.5929450570003185]
Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable.
We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training.
arXiv Detail & Related papers (2024-05-28T13:47:21Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Don't do it: Safer Reinforcement Learning With Rule-based Guidance [2.707154152696381]
During training, reinforcement learning systems interact with the world without considering the safety of their actions.
We propose a new safe epsilon-greedy algorithm that uses safety rules to override agents' actions if they are considered to be unsafe.
arXiv Detail & Related papers (2022-12-28T13:42:56Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - DESTA: A Framework for Safe Reinforcement Learning with Markov Games of
Intervention [17.017957942831938]
Current approaches for tackling safe learning in reinforcement learning (RL) lead to a trade-off between safe exploration and fulfilling the task.
We introduce a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
Our approach uses a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
arXiv Detail & Related papers (2021-10-27T14:35:00Z) - Safer Reinforcement Learning through Transferable Instinct Networks [6.09170287691728]
We present an approach where an additional policy can override the main policy and offer a safer alternative action.
In our instinct-regulated RL (IR2L) approach, an "instinctual" network is trained to recognize undesirable situations.
We demonstrate IR2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations.
arXiv Detail & Related papers (2021-07-14T13:22:04Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.