Safer Reinforcement Learning through Transferable Instinct Networks
- URL: http://arxiv.org/abs/2107.06686v1
- Date: Wed, 14 Jul 2021 13:22:04 GMT
- Title: Safer Reinforcement Learning through Transferable Instinct Networks
- Authors: Djordje Grbic and Sebastian Risi
- Abstract summary: We present an approach where an additional policy can override the main policy and offer a safer alternative action.
In our instinct-regulated RL (IR2L) approach, an "instinctual" network is trained to recognize undesirable situations.
We demonstrate IR2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations.
- Score: 6.09170287691728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random exploration is one of the main mechanisms through which reinforcement
learning (RL) finds well-performing policies. However, it can lead to
undesirable or catastrophic outcomes when learning online in safety-critical
environments. In fact, safe learning is one of the major obstacles towards
real-world agents that can learn during deployment. One way of ensuring that
agents respect hard limitations is to explicitly configure boundaries in which
they can operate. While this might work in some cases, we do not always have
clear a-priori information which states and actions can lead dangerously close
to hazardous states. Here, we present an approach where an additional policy
can override the main policy and offer a safer alternative action. In our
instinct-regulated RL (IR^2L) approach, an "instinctual" network is trained to
recognize undesirable situations, while guarding the learning policy against
entering them. The instinct network is pre-trained on a single task where it is
safe to make mistakes, and transferred to environments in which learning a new
task safely is critical. We demonstrate IR^2L in the OpenAI Safety gym domain,
in which it receives a significantly lower number of safety violations during
training than a baseline RL approach while reaching similar task performance.
Related papers
- Reinforcement Learning by Guided Safe Exploration [11.14908712905592]
We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal.
This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal.
We also regularize a target policy towards the guide while the student is unreliable and gradually eliminate the influence of the guide.
arXiv Detail & Related papers (2023-07-26T17:26:21Z) - Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - DESTA: A Framework for Safe Reinforcement Learning with Markov Games of
Intervention [17.017957942831938]
Current approaches for tackling safe learning in reinforcement learning (RL) lead to a trade-off between safe exploration and fulfilling the task.
We introduce a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
Our approach uses a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA)
arXiv Detail & Related papers (2021-10-27T14:35:00Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.